Anomaly Detection in Industrial Networks
using a
Resource-Constrained Edge Device
Anton Eliasson
Computer Science and Engineering, master's level
2019
Luleå University of Technology
Department of Computer Science, Electrical and Space Engineering
Anomaly Detection in IndustrialNetworks using a
Resource-Constrained Edge Device
Anton Eliasson
Lulea University of Technology
Dept. of Computer Science, Electrical and Space EngineeringLulea, Sweden
ABSTRACT
The detection of false data-injection attacks in industrial networks is a growing challenge
in the industry because it requires knowledge of application and protocol specific be-
haviors. Profinet is a common communication standard currently used in the industry,
which has the potential to encounter this type of attack. This motivates an examination
on whether a solution based on machine learning with a focus on anomaly detection can
be implemented and used to detect abnormal data in Profinet packets. Previous work
has investigated this topic; however, a solution is not available in the market yet. Any
solution that aims to be adopted by the industry requires the detection of abnormal
data at the application level and to run the analytics on a resource-constrained device.
This thesis presents an implementation, which aims to detect abnormal data in Profinet
packets represented as online data streams generated in real-time. The implemented un-
supervised learning approach is validated on data from a simulated industrial use-case
scenario. The results indicate that the method manages to detect all abnormal behaviors
in an industrial network.
iii
PREFACE
First of all, I want to thank my family for the support you have given me, not only during
this thesis but also during my entire study period at LTU.
This thesis work was carried out in collaboration with HMS Networks, where I especially
would like to thank my supervisor, Henrik Arleving, for your ideas, help, and feedback
throughout the project. I also want to thank the rest of the members of the company
who helped me, especially Mattias Svensson who assisted me with technical problems
with the software and hardware at the beginning of the project.
I would also like to thank my supervisor at LTU, Sergio Martin Del Campo Barraza.
Your quick responses and feedback have been amazing and have helped me a lot in order
to finish the report. Your knowledge in machine learning has been really valuable as well,
and you have learned me a lot during these months.
I also want to take the opportunity to thank the team at Stream analyze for guiding
me with the analytics platform, especially Johan Risch. Your help has been incredible.
Anton Eliasson
v
CONTENTS
Chapter 1 – Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Attacks on industrial systems . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Intrusion detection in Industrial Networks . . . . . . . . . . . . . 2
1.1.3 Network Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter 2 – Theory 7
2.1 Profinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Profinet system model . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Profinet Cyclic IO Data Exchange . . . . . . . . . . . . . . . . . . 9
2.1.3 Abnormality in Profinet packets . . . . . . . . . . . . . . . . . . . 10
2.2 Stream Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Selection of features . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Machine learning algorithm for anomaly detection . . . . . . . . . . . . . 15
2.5.1 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 3 – Methodology 19
3.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Generation of network data . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.2 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.3 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Creation of training and validation sets . . . . . . . . . . . . . . . . . . . 25
3.6 Implementation of machine learning algorithm . . . . . . . . . . . . . . . 28
3.7 Validation method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7.1 Validation of clustering model . . . . . . . . . . . . . . . . . . . . 29
3.7.2 Validation of the online data stream case . . . . . . . . . . . . . . 31
Chapter 4 – Results 33
4.1 Clustering method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Scenario A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.2 Scenario B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.3 Online data stream case . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter 5 – Discussions, Conclusions and Future Work 39
5.1 Discussions and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Appendix A – Tables 43
viii
CHAPTER 1
Introduction
There is a current need to protect industrial control systems from attacks in the industry.
False data-injection attacks, where application data is modified in the network packets
aiming to interrupt the normal behavior of an industrial process, is one type of attack
that can cause major problems if it is not detected in time. Intrusion detection systems
need to be able to detect these attacks as early as possible. Therefore, analysis of network
traffic occurs over continuous sequences of data in real-time. Moreover, it is difficult to
obtain knowledge about all kinds of existing attacks, especially unknown attacks that
had never been seen before. Given these reasons, a solution based on anomaly detection
and unsupervised machine learning is implemented in this project, which aims to detect
abnormal data in industrial network packets. An investigation of the performance of
an anomaly detection solution is made and conclusions are drawn to determine if it is
possible to detect abnormal data in an industrial network using deep packet inspection
and machine learning. The examination of the proposed solution is extended to evaluate
the performance of the machine learning implementation when it is running on a resource-
constrained edge device.
1.1 Background
1.1.1 Attacks on industrial systems
During the development of industrial control systems, security has not been prioritized
in particular. The reason has been that the factory floors have been isolated from the
outside world and there has been no need for securing the systems from attacks and
intrusions. However, the interest in security and malware detection capabilities is in-
creasing considerably due to a higher demand for devices that are part of the Industrial
Internet-of-Things (IIoT).
One example of a real-world malware is the well-known computer worm Stuxnet, which
attacked Iran’s nuclear plant in 2010 [1]. Stuxnet aimed to target industrial controllers,
1
2 Introduction
specifically Siemens programmable logic controllers (PLCs), used to control industrial
processes such as industrial centrifuges for the separation of nuclear material. The pro-
gram, which controls and monitors the PLCs, called Step7, uses a specific library for
the communication with the PLCs. The worm managed to get control over this library
and was able to handle requests sent to the PLCs. Next, malicious code was created
and introduced to target specific PLCs. The payload of the data sent from the PLCs
to the centrifuges changed and set the speed to much lower or higher than the regular
behavior. The slow speed causes the uranium process to run inefficiently, while the high
speed can potentially destroy the centrifuges. Before the attack was made, the malware
recorded the normal operation of the centrifuges speed. The recorded data was then fed
to the monitor program, WinCC, and prevented the system from alerting on anomalous
behavior, which made it difficult to detect the abnormal speed [2].
Other attacks such as Denial-of-Service (DoS), replay attacks, and deception attacks
have recently received attention in industrial systems as well. A DoS attack aims to drain
the resources of the network to make them unavailable and prevents the devices of the
network from communicating with each other [3]. Replay attacks intercept a valid data
transmission and then repeats the transmission with the purpose of fetching the same
information as the original data transmission [3]. In deception attacks, also known as
false data-injection attacks, data integrity is modified in the transmitted packets. Stuxnet
is one example of such an attack [3]. False data-injection attacks are harder to detect
and are not as frequently investigated as DoS attacks [4]. Other types of existing attacks
on industrial systems are eavesdropping, Man-in-the-Middle attacks, various types of
worms, trojans, and viruses [5]. Detection of these attacks in an industrial environment
is crucial in order to have a secure system. Depending on the type of attack mentioned
above, the detection method may be different. This work focus on the detection of false
data-injection attacks.
1.1.2 Intrusion detection in Industrial Networks
Intrusion detection systems for industrial systems can be divided into two categories,
network-based or host-based [6, 7]. Network-based systems collect and analyze the entire
network communication while the host-based solution identifies intrusion behavior on
each individual node. As network-based intrusion detection systems only need to be
installed on one point in a network and can analyze the inbound and outbound traffic of all
configured devices on the network, they are often more suitable for automation networks
[7]. In addition, host-based intrusion detection systems require additional memory and
computing resources, which can affect the industrial process.
Each of these two methods can, in turn, be either misuse-based or anomaly-based.
Misuse-based intrusion detection systems compare the incoming data with a predefined
signature, which is the reason they are known as signature-based intrusion detection
systems as well. Meanwhile, anomaly-based intrusion detection systems compare the
1.1. Background 3
current behavior against a normal learned behavior and try to identify abnormal patterns
[6]. The main difference between signature and anomaly-based systems lies in the concept
of attack and anomaly. An attack is an operation that aims to put the security of a
system at risk [8], while an anomaly, in a network security context, is a behavior that
is suspicious to be a risk to the security since it does not act similarly to the historical
behavior. Signature-based systems are very good at detecting predefined attacks but lack
the ability to detect unknown and unseen behavior. Anomaly-based detection techniques
have the potential to detect these unseen events. There is a great interest in being able
to analyze network traffic in an anomaly detection procedure, due to the increase of new
unknown attacks on modern industrial control systems. This permits to identify not only
old known threats but also abnormal behavior that has not been seen before.
False data-injection attacks, can only be detected with a deep packet inspection where
the payload of the packets sent over the network is explored [9]. Since industrial systems
include multiple different communication protocols and standards, each protocol needs its
own analysis and therefore the packet inspection might differ depending on the protocol.
Profinet is one example of all the communication standards used in the industry and is one
of the emerging protocols found in many industrial applications today [7]. The protocol
communicates over Industrial Ethernet and is one of the leading Industrial Ethernet
Standards in the market [10]. The Profinet application data in the packets sent over the
network varies depending on the application. Taking the example with the centrifuges
of Iran’s nuclear program, the actual application data of the Profinet packets can be the
desired speed of the operating motor.
Anomaly-based intrusion detection systems in information technology (IT) systems are
not so common in practice. The reason is the dynamic behavior of regular IT systems,
which makes it difficult to define a proper model for normal behavior. For industrial
networks, however, communication is often much more structured and steady [7]. These
considerations motivate our work on intrusion detection systems that can inspect indus-
trial communication protocols on the application level with deep packet inspection and
detect anomalies in the network data.
1.1.3 Network Analytics
Where to run the analytics of network data, and data in general, has importance to
different concepts. Latency and network bandwidth are terms that might be interpreted
differently depending on where the analytics is running.
Virtually unlimited resources in the cloud have resulted in the emergence of a lot of
different cloud services. Running analytics in the cloud requires that the data to be
analyzed is sent from the data source to the server continuously. An alternative is to
run analytics in the local network, closer to the data source. Edge computing is the
concept of running analytics directly at the network edge [11]. Ahmed [11] compares
the pros and cons of cloud computing compared to edge computing. Cloud computing
4 Introduction
has the benefit that hardware capabilities are scalable, resulting in unlimited resource-
capabilities in practice. The disadvantage of cloud-based solutions is that latency and
jitter are often high. Sending continuous data to the cloud requires a constant high
bandwidth. Edge-based solutions, on the other hand, can maintain low latency, since the
traffic to the cloud can be reduced. Edge computing is therefore suitable for applications
that require analytics on a great volume of data in real-time. The drawback might be
that edge devices are often resource-constrained.
Detecting anomalies in network data requires analytics of big data volume in real-time.
This thesis work investigates an edge-based solution, where detection of anomalies in
network traffic will be made on a resource-constrained edge device.
1.2 Related work
Numerous studies have been made on information security and intrusion detection sys-
tems in general [12]. Wressnegger et al. [13] discuss the need for protocol specifications
to analyze the content in the data for industrial networks and presents a content-based
anomaly detection framework for binary protocols. Their method manages to detect
97.1 % of the attacks in their dataset with only 2 false-alarms out of 100,000 messages.
Most of the research within network-based intrusion detection systems on industrial net-
works is focused on detecting anomalies in the traffic flow characteristics, such as through-
put, port number, and IP addresses. Mantere et al. [14] analyze possible features for use
in a machine learning based anomaly detection system. The research is however limited
to the IP traffic and does not consider the actual payload of the traffic.
A few papers focus specifically on the Profinet standard. Sestito et al. [15] present a
method for detecting anomalies in Profinet networks. Their method uses an Artificial
Neural Network (ANN) for classifying the incoming data into 4 different classes, where one
of the classes is normal operation. The authors derive 16 different traffic-related features
used for the ANN and conclude that their methodology may be successful for anomaly
detection in any Profinet network. Their method is however based on supervised learning
and requires labeled data for classification. Schuster et al. [9] also present an approach
for anomaly detection of Profinet. Unlike the method above, their model analyses all
network data, including flow information, application data, and the packet sequences by
doing deep packet inspection. They extract features from the actual packet data, such
as mac address, packet type, packet payload. Schuster et al. [16] present the results of
applying the one-class SVM algorithm for detecting anomalies in Profinet packets.
The work in this thesis focus on similar approaches, with deep packet inspection and
unsupervised anomaly detection methods. Additionally, the method presented in this
work is based on a stream-based approach where the machine learning algorithm is doing
the analytics online on the incoming data. Mulinka and Casas [17] compare different
stream-based machine learning algorithms, for the case of detecting abnormal network
traffic in an online manner.
1.3. Problem formulation 5
1.3 Problem formulationAnalyzing industrial networks with deep packet inspection is not completely straightfor-
ward. One of the key challenges is the need for specific knowledge about each individual
protocol. Today, there does not exist a solution in the market which can inspect indus-
trial network packets at a payload-level and detect malicious malware with an anomaly
detection approach and in addition, run the analytics on a resource-constrained device.
Been able to detect unknown malware and additionally run the analytics locally on the
edge would have many benefits.
HMS Networks AB, a Swedish company from Halmstad, supplies products and solutions
for industrial communication and the Industrial Internet-of-Things. The company has
a large amount of experience in industrial communications and is a leader in providing
software that connects industrial devices to different industrial networks and IoT systems.
HMS constantly strives to prototype new solutions and possibilities within the market.
With the new technology within edge computing and machine learning, the company
wishes to examine the possibilities to design and implement a machine learning-based
solution running on a resource-constrained edge device from HMS.
Detection of anomalies in an industrial protocol is an interesting use case for both
the company and the industry. Since HMS has deep knowledge about specific industrial
protocols, the company wants to examine the possibility of using machine learning to
detect anomalies in industrial network packets. This project is narrowed down to the
PROFINET standard because the way of detecting anomalies is not the same for all
protocols. Therefore, the goal of this project will be to implement and evaluate a machine
learning solution to detect anomalies in Profinet packets. The solution should run on
one of HMS resource-constrained edge devices. In summary, the aim of the project is to
investigate and answer the following research questions:
Q1 Can abnormal data in an industrial network be detected using Deep Packet Inspec-
tion and Machine Learning?
Q2 To what extent, as it relates to performance, speed, and system footprint, can an
unsupervised anomaly detection algorithm be useful on a resource-constrained edge
device?
Q3 Is the implementation of the solution feasible in an existent device available in the
market?
1.3.1 Delimitations
As described in the problem formulation, the intention of the thesis is to investigate
whether it is possible to detect abnormal data in industrial networks. Unfortunately,
it will not be possible to test and validate the implemented solution on network data
generated from an actual industrial environment. Therefore, data will be simulated.
6 Introduction
Real world network data would be needed to deploy the implementation as a complete
solution in production.
1.4 Thesis outlineThis thesis is structured as follows: Chapter 2 provides the background theory about
the different elements used in the proposed solution, such as theoretical information
about Profinet, stream processing, feature engineering, anomaly detection, and machine
learning. Chapter 3 describes the method used in the implementation of this work, the
tools used in the implementation and how they fit together. This chapter also provides
a description of how the different parts in the solution are generated, implemented and
validated. Chapter 4 presents the results of the proposed solution, while Chapter 5
provides conclusions to the research questions stated in the problem formulation and
future work.
CHAPTER 2
Theory
Machine learning has existed for several decades, however, it is until recently that it
has gained popularity and many of their methods have become possible to implement
in real-world applications. Machine learning can be described as the process of using
algorithms that learn from data to predict outcomes on later observations [42]. This is
in contrast to algorithms that are explicitly programmed by humans.
A predictive machine learning model requires data for its training. When the model
has been trained and is provided with inputs, it makes prediction outputs based on the
data the model was trained with. In the case where detecting abnormal data in industrial
networks is the goal, the model is trained with data from network traffic, and the model
defines if new incoming data is normal or abnormal. The inputs to the model, also
known as features, must be extracted from the data source. This process is called feature
extraction and is a crucial step in the design of a machine learning solution.
The first step in the design of a machine learning solution is data collection, which
in this project is data in the form of Profinet packets. Data processing includes data
collection, pre-processing and feature extraction. The processed data can then be used
as input to train a model, which finally can make a decision based on future inputs.
2.1 Profinet
Industrial Ethernet technology has recently increased in popularity by offering higher
speed, longer connection distance and ability to connect more nodes than the tradi-
tional serial Fieldbus protocols in the industry floors [21]. Among several Industrial
Ethernet standards, Profinet is one of the most common in the industry today, used in
solutions such as factory automation, process automation, and motion control applica-
tions. Depending on the type of functionality and requirements for the data transmission
over the network, Profinet offers two variants of functionalities. The first one, defined as
Profinet CBA (Component Based Automation), is suitable for component-based machine-
7
8 Theory
to-machine communication via TCP/IP. The other variant is Profinet IO, used for data
exchange between controllers and devices. This thesis will focus only on Profinet IO.
2.1.1 Profinet system model
A Profinet IO system consists of the following different device classes that communicate
with each other:
IO-Controller A Profinet IO-Controller is typically the Programmable Logic Controller
(PLC) where the control program runs. The IO-Controller exchanges information
to the IO-Devices in the network, and acts as a provider of output instructions to
the devices and a consumer of input data from the devices [18, 20].
IO-Device Profinet IO-Devices are distributed I/O field devices that can exchange data
with one or several IO-Controllers [20].
IO-Supervisor An IO-Supervisor can be a personal computer (PC), programming de-
vice (PG), or a human-machine interface (HMI). The purpose of an IO-Supervisor
can be commissioning or diagnostic [20].
A communication path between an IO-Controller and an IO-Device must be established
to set up a communication between them, which is made during system startup [20].
When the IO-Controller is initialized, it sets up a connection, called Application Relation
(AR), to each IO-Device using Distributed Computing Environment / Remote Procedure
Calls (DCE RPC) [33]. The AR specifies Communication Relations (CR) where specific
types of data are sent. The different CRs that exist in an AR are Record data CR,
IO data CR, and Alarm CR. Figure 2.1 illustrates the application and communication
relations.
Figure 2.1: AR and CR between IO-Controller and IO-Device. Picture taken from [20].
2.1. Profinet 9
2.1.2 Profinet Cyclic IO Data Exchange
Profinet provides services such as the cyclic transmission of I/O data (RT and IRT),
acyclic transmissions of data (parameters, detailed diagnostics, etc.), acyclic transmission
of alarms and address resolution [20]. However, this project will only deal with the cyclic
IO data exchange, where data is sent from an IO-Device to an IO-Controller. The Cyclic
transmission of I/O data in PROFINET IO occurs in the IO data CR, where cyclic data
is sent between an IO-Controller and IO-Device. The data is always transmitted in real-
time according to the definitions in IEEE and IEC for high-performance data exchange of
I/O data [20]. The real-time communication in Profinet IO is separated into four classes,
as illustrated in Table 2.1.
RT CLASS 1 is used for unsynchronized communication within a subnet, whereas
RT CLASS 2 can be used for either unsynchronized or synchronized communication.
RT CLASS 3 supports Isochronous Real-Time with clock rates of under 1 ms and jit-
ter below 1 µs. The last class, RT CLASS UDP, uses unsynchronized communication
between different subnets. This project will deal with Profinet IO real-time class 1.
Profinet communication occurs in the data link layer, using the Ethernet protocol,
according to the Open Systems Interconnection model (OSI model). An Ethernet frame
in Profinet, illustrated in Figure 2.2, consists of a 16 bytes header block, containing
destination address, source address, Ethertype, and Frame ID. The Ethertype is set to
0x8892, which indicates the protocol used in the payload is Profinet. The Frame ID
differentiates the Profinet IO service used. For Cyclic data exchange with real-time class
1, the values are between 0x8000 and 0xBBFF. The payload, normally with a size between
40-1500 bytes, is the application data sent between an I/O-Controller and I/O-Device.
The cycle information, called cycle counter, sets the update time of the cycling data sent
from the provider [18]. The frame includes also status information, used for validation
of data status and transfer status in the cyclic exchange.
Table 2.1: Real-time classes in Profinet IO.
Real-time classes in Profinet IO
Class Functionality
RT CLASS 1 Unsynchronized communication within a subnet.
RT CLASS 2 Unsynchronized or synchronized communication.
RT CLASS 3 Isochronous Real-Time communication.
RT CLASS UDP Unsynchronized communication between different sub-
nets.
10 Theory
Figure 2.2: Profinet frame. Picture taken from [19].
2.1.3 Abnormality in Profinet packets
The meaning of abnormal data in Profinet packets depends on the application and re-
quires the definition of normal behavior. A common element for all false-data injection
attacks is that the contents in the packets differ from the normal case. The bytes in the
payload are modified to cause damage or interfere with the regular industrial process.
The main challenge to the detection of anomalies is the uncertainty for the detection sys-
tem of the significance of each byte in the packet. Expanding on the Stuxnet example,
where the speed of the centrifuges was changed, the speed is represented by a specific
number of bytes in the payload. Detection of the abnormal behavior of the speed requires
that the detection system knows where the data for the speed is in the payload. This is
not possible if the detection system is supposed to work for the general case with a lot
of different applications. Therefore, a more general approach is studied in this project,
where the aim is to detect deviations of the behavior in the packets.
An industrial process is often quite static in normal operation. As long as no unexpected
behavior occurs in the process, the inputs and outputs of a PLC stay within a limited
interval. During an attack, however, the static operation gets disrupted and the interval
of the input and output values are likely to widen. This results in that the range of value
combinations the bytes in the payload can take increases, meaning more variations in the
payload. Therefore, we make two assumptions for this project:
1. The payload of the packets varies little during normal operation resulting in a
limited number of combinations.
2. The payload of the packets varies more during an attack resulting in additional
variations in the data that would not appear during normal operation.
2.2. Stream Processing 11
These assumptions are backed by experts in the company, who also agree on them. The
detection method will use these assumptions to detect abnormal behavior. Abnormal
behavior on packet level in an industrial process is thus related to how much the content
varies during a time period.
One drawback of anomaly-based intrusion detection systems is the high amount of false
alarms. Not all anomalies are equivalent to an attack. For example, a fast increase of
user activities in a network may be the result of a DOS or Worm attack but it can also be
from an installment of new software under normal network operation [34]. Furthermore,
abnormal data in Profinet does not necessarily mean that it is an attack either. It could
be due to physical disturbances in the sensors that disrupt the control loop in a PLC.
However, the aim of the system in this project is to detect abnormal behavior. Further
analysis, out of the scope of this project, is required to decide whether the abnormal
behavior detected is an attack or not.
2.2 Stream Processing
A data stream can be described as a continuously growing sequence of data items [24].
Data streams exist in many different shapes in the real world, where some examples are
sensor data, network traffic, and sound waves. A lot of applications today require to
analyze incoming data streams in an online manner without actually storing the data.
Anomaly detection in network traffic by collecting all data and storing it in a traditional
database for offline analytics is not practical or sufficient when the goal is to detect the
anomalies in real-time. Furthermore, it is not always possible to store the large volume
of data that is required for processing because of constraints in capacity. The continuous
data used in many applications is often massive, unbounded and evolves over time [27].
Storing all incoming data in the memory of the device where the analytics takes place
may not be always feasible, especially if the device is resource-constrained. A tool that
makes it possible to query data directly on the incoming streams without storing all data
is often necessary. A Data Stream Management System (DSMS) is a software that can
process incoming queries over continuous data streams, in contrast to a DBMS (database
management system) which works only for static data. The queries are often written in
a scripting language, that is similar to SQL language, and can be used to make various
kinds of filtering, calculations, and statistics on the streaming data. The queries over
the data streams are continuous queries (CQs), which means they run continuously until
stopped by the user and produce a streaming result as long as the queries are active [25].
This is in contrast to traditional queries on databases where the queries are executed
until the requested data is delivered.
12 Theory
2.3 Feature Engineering
Anomaly detection methods based on a machine learning model use measurable infor-
mation as input to decide whether the incoming data follows a regular pattern or not.
How to represent the input, also known as features, to a machine learning algorithm
is an important element when doing a machine learning project [26]. Often, the col-
lected raw data can not be used directly as input, but instead derived features need to
be constructed from the data source. Feature engineering is the task of choosing the
features to be used for the machine learning algorithm, where the choice is often based
on domain-knowledge from experts within the specific field.
2.3.1 Features
Tran [41] proposes two different categories of network related features with the respective
subcategories and examples:
• Packet traffic features:
Packet traffic-related features inspect individual packets and extract useful informa-
tion from their header and content. The author divides the packet traffic features
into four subcategories.
– Basic packet features: The most simple category is basic packet features,
where basic header fields such as source port, destination port header length
and various status flags are some examples.
– Content-based packet features: Content-based packets features are derived
from the actual content in the packets.
– Time-based packet features: Time-based packet features focus on measuring
the number of a certain variable occurring during a time period. One example
of a time-based packet feature can be the number of frames from destination
to the same source in the last t seconds.
– Connection-based packet features: The last subcategory, connection-based
packets features, are those that identify characteristics between the sender and
receiver. The number of frames to a unique destination in the last n packets
from the same source could be a type of connection-based packet feature.
• Network flow traffic features:
The other category is network flow traffic-related features. The main difference
between packet traffic features and network flow traffic features is that the latter
one inspects the flow, meaning a sequence of packets, between source and destina-
tion. Analyzing sequences of packets is good as it enables to identify patterns that
otherwise might be hidden on the individual packet level.
2.4. Anomaly Detection 13
– Basic features: An example of a basic traffic flow feature is the length of the
flow (in seconds).
– Time-window features: The author mentions the number of flows to unique
destination IP addresses inside the network in the last t seconds from the same
source as an example of a time-window feature.
– Connection-based features: The number of flows to a unique destination IP
address in the last n flows from the same source is an example of a connection-
based traffic flow feature.
The idea discussed by Tran [41] is applied to TCP and IP traffic. However, the idea
remains the same within Profinet traffic. In summary, features can be constructed in
various domains, such as time-domain, connection-domain, and frequency-domain. The
features to be selected depends on the detection goal and how the abnormal behavior
might look like.
2.3.2 Selection of features
The selection of which features to be used in this project are based on domain-knowledge
and intuition of what measurements have clear distinctions between normal and abnormal
data. The constructed features consider the assumption stated in section 2.1.3 that the
payload varies more during abnormal operation, the constructed features take that in
mind. Since the goal is to use deep packet inspection and detecting abnormality in
the content of the packets, content-based packet features are used. The features also
have to take connections into consideration, because in an industrial environment the
data sent between an IO-Controller and different IO-Devices might not be the same.
The IO-Controller can have separated data transmissions to different devices, resulting
in diverse data depending on the source and destination connection. Therefore, the
machine learning model needs inputs that separate the connections. Another aspect to
take into account is that abnormal behavior may not be possible to detect by a simple
analysis of each individual packet. Detecting variations in data might require to examine
several packets over a window or sequences of packets. Schuster et al. [16] construct
feature vectors based on sequences of multiple packets. A similar approach is used in
this work. The selected features are:
• Standard deviation of the payloads from source to destination for the n last packets
• Number of distinct payloads from source to destination for the n last packets
2.4 Anomaly DetectionThe goal with anomaly detection, also referred to as outlier detection, is to identify
patterns in data that are different from the expected behavior [35]. The expected behavior
14 Theory
depends on the underlying distribution of the data. Anomalies are those behaviors or
objects that are not considered to be normal. Anomaly detection has been studied
since the early 19ths century by the statistics community and finds its use in several
applications amongst intrusion detection. In intrusion detection systems and network
security, the aim is to find known or unknown anomalies that indicate an attack or
a virus. Garcia-Teodoro et al. [8] divide anomaly-based network intrusion detection
systems into three categories:
• Statistical-based
• Knowledge-based
• Machine learning-based
Statistical-based methods fit a statistical model representing the stochastic behavior of
the system and assume that normal data belongs to the higher probability regions of
the model whereas anomalies lie in the lower probability regions. Incoming data is
compared with the trained model to estimate whether it is an anomaly or not. The
anomaly decision is often based on an anomaly score with a predefined threshold. If
the score exceeds the threshold the system detects the incoming data as an anomaly
[8]. Knowledge-based intrusion detection systems are formed by human experts and are
often defined by rules of what is a normal system behavior. The main advantage of
using knowledge-based anomaly detection systems is the ability to relate the acquired
information to the knowledge of the model. Another advantage is that the number of
false alarms is often low. The third approach listed above is a machine learning-based
method, which will be the focus in this thesis.
Machine learning-based anomaly detection use machine learning algorithms to classify
the data as either normal or abnormal. A feature vector can be described as X(t) ∈ Rat time t ∈ [0, t]. Consider two states that define a normal or abnormal operation, where
wq, q = 0, 1. w0 = 0 indicates normal data and w1 = 1 stands for anomalous data.
In machine learning a mapping between X and wq is made by learning from historic
measurements. Consider a data set D of m measurements where each measurement is
an observation on X. With xi(t) indicating the i-th observation, also called a training
sample in machine learning, D can be described as D = {xi(t)}mi=1. Furthermore, the
output model set can be described as Dl = {yq}kl=1, with k measurements, where yq are
individual samples on wq, which are called labels with q set to either 0 or 1. A pair
(xi(t), yi) is a labeled measurement. For example, (xi(t), 0) means a sample for normal
operation and (xi(t), 1) stands for an abnormal sample. Therefore there are three types
of data sets of measurements.
• Normal data: Dn = {xi(t), 0}k−ui=1
• Undefined data: D = {xj(t)}mj=1
2.5. Machine learning algorithm for anomaly detection 15
• Anomalous data: Dl = {xr(t), 1}ur=1
The equations are based on work by Thottan et al. [34]. D in this project is the raw
measurement data of Profinet packets. Dn is the measurements when the system is
running in normal mode, that is when no attacks occur. Dl corresponds to observations
when an attack is happening, which is an abnormal behavior. Anomaly detection learns
a mapping between a training set, consisting of measurements, to the operation state
wq. This learning can then be used to classify new incoming events to either normal or
anomalies [34].
A training set contains combinations ofDn, D andDl. WhenDl is included the learning
is said to be supervised, since labels are included in the training set. Although supervised
learning methods can provide higher accuracy than using unsupervised learning there are
some drawbacks. Labels are often very difficult and time-consuming to obtain in reality.
The knowledge of how the network and packets behave is often too poor in order to
set proper labels. These concerns apply to network behavior in general but also at the
packet level in particular. Another disadvantage is that labeling all possible attacks is
not possible, particularly new attacks that have never been seen before [36]. When only
D is included in the training set, the learning is unsupervised. In unsupervised learning,
the goal is to detect anomalies by only looking at the properties and relations between
the data elements in the data set. No labels are required neither for normal or abnormal
data. Unsupervised anomaly detection is a viable approach because it does not need
predefined labels to construct a model. This work will consider unsupervised learning
because of the motivation stated above.
2.5 Machine learning algorithm for anomaly detectionThis section describes the anomaly detection algorithm used in this work. Clustering is
a common method for anomaly detection. Clustering is an unsupervised classification
method used to separate data into groups (clusters), where each cluster has similar char-
acteristics [28]. In clustering, the available data to be grouped is not labeled beforehand
as is the case for supervised learning methods. Instead, the method tries to group a
collection of unlabeled patterns into meaningful categories that are obtained only from
the data itself [28]. The selection of ideal features for the clustering method is important
to be able to recognize patterns among the different clusters [29]. Clustering methods
can be divided into four different categories [30]:
• Partitioning methods
• Density-based methods
• Hierarchical methods
• Grid-based methods
16 Theory
Partitioning-methods divides the data into k partitions, where each partition is a clus-
ter. A well-known partitioning based method is the k-means algorithm. K-means uses
the mean as a function to decide which cluster an observation belongs to. Density-based
algorithms divide data into groups based on the density. High-dense regions are points
that lie close to each other and are grouped into the same cluster. Correspondingly,
low-dense regions are considered to be outliers. Hierarchical-methods group objects into
hierarchical structures, while grid-based divide objects into grid-structures. Chen and
Tu [31] describe density-based methods as natural and attractive for data streams, as
they have the ability to find arbitrarily shaped clusters, and they need to examine the
data only once. They can also handle noise well and it does not need a prior specification
of the numbers of clusters to be used unlike the partitioning-based k -means algorithm,
where you have to define the number of clusters in advance k. Examples of density-based
methods are DBSCAN, OPTICS, and DENSTREAM. DENSTREAM is based on DB-
SCAN and has additional features that enable the algorithm to be used with evolving
data streams [40]. The algorithm that will be studied in more depth in this work is DB-
SCAN. The reason why DBSCAN is selected over other clustering methods is because
it can find arbitrarily shaped clusters and consequently there is no need for defining the
number of clusters beforehand. The reason it is decided to use clustering in this project
instead of any other unsupervised machine learning method is because of its ability to
group observations into several groups. As described in section 2.3.2 the features take
connections in mind, which means that the model might divide observations into groups
related to the data sent between each connection. Other methods besides clustering were
intended to be tested in this thesis as well but because of time limits, it was not possible.
These methods were One-class support vector machine and Local Outlier Factor.
2.5.1 DBSCAN
A well known density-based clustering algorithm is the DBSCAN (Density-Based Spatial
Clustering of Applications with Noise) algorithm. DBSCAN takes two user-defined input
parameters, neighborhood distance Eps and minimum number of points MinPts. Having
defined a set of points to be clustered, each point can be classified as either a core
point, reachable point or an outlier point. A core point is a point with at least MinPts
neighbors (including the point itself) within the radius of Eps, where the Eps radius can
be measured with an arbitrary distance method such as the Euclidean distance. Each
neighbor to the core point within the Eps radius is called a direct density reachable
point and belongs to the same cluster as the core point. These neighbors can again be
core points. In that case, the points in the neighborhood of these core points are also
included in the same cluster, where each point is a density reachable point. None-core
points that are density reachable are called border points. All other points that are not
density-reachable from any other point are called outliers or noise points and are not
included in any cluster [32].
2.5. Machine learning algorithm for anomaly detection 17
Figure 2.3: DBSCAN. Picture taken from [32].
Figure 2.3 shows an illustration of the DBSCAN model. MinPts parameter is set to 4
and Eps is visualized by the circles. Point A and all the other red points are core points
since there are at least four points within its own neighborhood. The yellow points B and
C are not core points, but since they are density-reachable from point A they still belong
to the same cluster and are defined as border points. Point N is not reachable from any
other point. Thus, it is considered as an outlier. The task of the DBSCAN algorithm is
to compute clusters and find outliers or anomalies as illustrated in the model. Figure 2.4
describes the pseudocode for DBSCAN.
18 Theory
Figure 2.4: Pseudocode of DBSCAN. Picture taken from [32].
CHAPTER 3
Methodology
This chapter describes the methodology behind the implementation of the project. Vari-
ous software and hardware tools that are used and the architecture of how all these tools
fit together is described. It also describes how self-generated network data is processed
to create the selected features for the machine learning model and how the model is
implemented and validated. In short terms, network data is generated according to a
simulated use-case. The network traffic is sniffed and collected on an edge-device where
a machine learning platform is installed. The final solution consists of a machine learning
algorithm that runs on the device and classifies the incoming data as either normal or
abnormal.
3.1 Tools
3.1.1 Hardware
Edge device One of the goals of the project is to run the analysis on a resource-
constrained edge device. The edge device used is a development board from HMS
called Beck DK151 (DB150). The board has an embedded controller called SC145
with a 128 megabyte (MB) working memory, 64 MB flash memory and it comes
with an ARM Cortex A7 processor. The device has a built-in Linux-based operating
system called RTOS-LNX. The machine learning program runs on this device.
Profinet IO Controller A Siemens S7-1200 PLC is used as a Profinet IO Controller.
The PLC had to be configured to exchange cyclic data with the desired IO device
from the windows program Tia Portal before system startup. The PLC can provide
output data to IO devices and also consume data from IO devices. For this project,
it is configured to consume incoming data from an IO-device (Anybus X-gateway).
It can handle communications of up to 10/100 Mbps.
19
20 Methodology
Profinet IO Device An Anybus X-gateway from HMS Networks with Modbus TCP -
Profinet IRT translation is used as a Profinet IO Device. The device permits to
send cyclic I/O data between Modbus TCP networks and Profinet. The gateway
translates Modbus data into Profinet cyclic I/O data. This results in that Profinet
frames can be sent from the X-gateway to the PLC. The reason a Modbus TCP
gateway is included in the project is that it enables to generate Profinet data in a
very flexible way, which is described further down.
Network sniffer A Network Tap is used to sniff network traffic for monitoring. The
switch taps into the connection between the IO Controller and IO Device and
forwards the traffic to the edge device. It needs to be clarified that the edge device
is not a Profinet IO Device. It just collects traffic sniffed from the network tap.
Visual Analyzer A personal computer (PC) is used for data visualization and deploy-
ing queries to the data streams.
3.1.2 Software
Packet generator The initial idea was to program the PLC to send Profinet data from
the PLC to an IO device. However, after discussions with some employees at HMS,
it was concluded to change the architecture due to the complexity of generating
desired Profinet data. Instead, a packet generator was written in Python to send
the desired cyclic data from the Anybus X-gateway to the PLC. The Modbus/TCP
client library pyModbusTCP is used in the Python program to write data over
Modbus TCP to the gateway. The data is then transferred by the gateway over
Profinet to the PLC. Having this architecture makes it very flexible to try different
structures of the data to be sent over the Profinet network, and the changes can be
made very quickly. The operation of the python program is described in Section
3.3.
Packet collector The cyclic Profinet traffic is sniffed and collected on the Edge device.
The software application that reads the incoming traffic on the Ethernet port of
the device is written in the programming language C. The board includes an Ap-
plication programming interface (API) called Packet API that provides functions
for reception of Ethernet packets. The application reads all incoming traffic and
filters on cyclic Profinet I/O packets.
Stream engine The analysis part is made with the help of a stream processing and anal-
ysis system. The platform is called sa.engine and is provided by a company named
Stream Analyze. The platform supports online analysis of data streams includ-
ing deployments of statistics- and machine learning models to resource-constrained
edge devices. The largest configuration of the software requires only 7 MB storage
and was installed on the board without issues. The platform permits the creation
3.2. Architecture 21
of data streams in the programming language C. Since C is used for the data collec-
tion, the collection and creation of the stream are merged into the same program.
The stream consists of arrays where each array contains the data sources intended
for analysis. Once the initialization of the streams is completed, they are used by
the analysis tool in the platform to make continuous queries to the streams. For
the analyst, the platform provides a Visual Analyzer running on a PC. The Visual
Analyzer consists of a graphical user interface where queries can be written and
deployed to edge devices. The queries are written in a language, similar to SQL,
called OSQL. The CQ analyzes the data streams and the result is sent back to
the visual analyzer where it can be visualized as either text result or appropriate
graphical plots. The communication between the visual analyzer and the edge de-
vice occurs over TCP. Sa.engine has also support for developing machine learning
models in the visual analyzer. After training of the model, it can be deployed on
the edge device where online analytics is running.
3.2 Architecture
The overall architecture of the project is described in Figure 3.1. The python script
writes data over Modbus TCP to the Anybus X-Gateway. The gateway then translates
the data into Profinet and sends the Profinet frames via the network tap to the PLC. The
PLC acts as a Profinet IO Controller, which receives input from the IO Device (Anybus
X-Gateway). In a real world, this input could be a sensor value acting as an input to a
control loop in the PLC, or a simple monitoring measurement. The significance of the
Profinet traffic is not relevant for this project, neither is the direction of the traffic since
the goal is to find anomalies in the packets. Meanwhile the traffic is generated between
the IO Device and IO Controller, the network tap sniffs the packets and sends them to
the edge device, where the anomaly detection algorithm runs. Data visualization occurs
on the laptop via sa.engines visual analyzer.
3.3 Generation of network data
The aim of this project is to detect anomalies in Profinet packets. An already running
system setup generating real data is not available, neither is actual data generated from a
real attack. Therefore, a simulated use case needs to be constructed, where both normal
and abnormal data is generated. This simulated case, written in python, strives to be
as realistic as possible. It should also be generic, meaning that the detection method
should work for the general case and not be focused on one specific attack scenario. As
stated in section 2.1.3, normal data is more static than abnormal data. This situation
is taken into consideration in the script. Therefore, the script sticks to the idea that
a PLC normally operates in strict patterns with small changes in inputs and outputs.
22 Methodology
Figure 3.1: Overall architecture of hardware and data transmission.
Taking an industrial robot as an example, the movement of the robot is fixed. The same
is similar for a conveyor belt. The idea and structure of the generated data were defined
in conjunction with experts from HMS. Although the use case is supposed to be generic,
it requires backing of real-world logic. In an industrial setting that uses Profinet, a PLC
sends all information needed in the Cyclic I/O payload. Taking the example of an electric
motor drive, information such as speed and direction is embedded into the payload block.
The case will be influenced by a motor drive scenario where speed and direction will be
randomized and embedded into the Profinet packets. The script generates normal data
and abnormal data separately. The generation of data for normal operation follows the
following procedure:
1. Randomize time t (0-5 seconds)
2. Randomize speed s (0-30)
3. Randomize direction d (0 or 1)
4. Write speed and direction into registers in Anybus-X Gateway for t seconds.
5. Profinet cyclic I/O data is sent to the PLC.
6. Repeat steps 1 to 5 until program is stopped by the user.
3.4. Data processing 23
For abnormal data generation, the procedure looks the same. The only difference is
the speed range s, which is randomized between 0 and 1000. The abnormal data will
have a larger variance.
3.4 Data processing
The collected data requires to be processed to extract useful information from the raw
data sniffed from the network. The raw network traffic needs to be filtered and structured
to create proper features for the detection method. The steps involved in the data
processing stage are filtering, cleaning, normalization and feature extraction from the
data.
3.4.1 Data collection
Since no external data is available beforehand, all data is collected directly from the traffic
in the network. After the production of the Profinet traffic, as described in Section 3.3,
the network data is ready to be sniffed and captured. Raw Profinet packets are collected
directly on the edge device as described in Section 3.1.
3.4.2 Pre-processing
Useful information has to be produced from the raw network traffic to detect abnormal
data. The network traffic is filtered to consider only Profinet frames, which in turn
are filtered to only consider Cyclic I/O data. Therefore, frames with frame id between
0x8000 and 0xBBFF are used. This filtering is made in the C application running on the
edge device. To ensure that there are no duplicates or missing values, Wireshark is used
to record a sample of the traffic on the PC. The recorded traffic is relayed to the edge
device to ensure that the traffic is identical on the edge device. This is carried out by
comparing the recorded traffic file with the output on the device. This action revealed
that no packets were missing when comparing them, nor duplicates were found.
When filtering and cleaning stage is completed, the next step is to generate data streams
containing data sources. Data streams are created with sa.engine in the same application
running on the device. Each element in the data stream is represented as an array
containing all the data sources intended for the stream. The collected frames include
useful information as indicated in Figure 2.2. From these frames, data sources can be
fetched. The following data sources, all based on each individual packet, are used:
• Timestamp
• Packet size
• Mac source address
24 Methodology
• Mac destination address
• IO Data size
• Frame ID
• Cycle counter
• IO Data
Each item in the stream is represented by these packet characteristics. When the data
stream is ready, the stream is queried from the visual analyzer and the elements in the
stream are saved to a .json file on the PC. This procedure is carried out both when
normal operation and abnormal operation are running separately, resulting in two data
sets. One data set contains normal data and another contains only abnormal traffic.
The file with normal data consists of 78349 rows (packets) and 8 columns (data sources),
whereas the file representing abnormal data contains 46376 rows. Having the data stored
in a file provides greater flexibility to experiment with the data.
3.4.3 Feature extraction
The collected data sources need to be converted into useful features that can be used in
the machine learning model. The individual data sources in the stream cannot be used
as input to the machine learning algorithm. Instead, data transformation is required
to enable their use by the machine learning algorithm. Thus, a data set containing
the features to be used by the machine learning algorithm is constructed. The feature
engineering process is carried out offline. The feature extraction procedure is done on
the data sources stored in the .json files. The feature construction is implemented in
a function called feature extraction( windows ) using the OSQL language. The function
takes a stream of windows of specific size and stride. The windows are created by the
built-in function winagg( s, sz, str ) in sa.engine, which forms a stream of windows of
size sz and stride str over a stream s. The stream s is created by calling the function
read stream( file ), which creates a stream containing the elements in a .json file. The
stream contains the data sources listed in the previous subsection. The selected features
described in 2.3.2 are thereby constructed over windows of the data sets. An explanation
of how the selected features are extracted is provided below.
Standard deviation of the payloads from source to destination
for the n last packets
This feature is created by taking the standard deviation of the bytes in all payloads
in a window with size n. The bytes in the payload are first divided by 256 (the
maximum number for a byte) to get a normalized value between 0 and 1.
3.5. Creation of training and validation sets 25
Number of distinct payloads from source to destination for the n last packets
This feature is created by calculating the number of different variants of IO Data
contents existing per window with size n. A lot of variations in the payload from
packet to packet will result in a high number and vice versa. The feature is normal-
ized by dividing by the size of the window, which is the maximum possible number
of variations in a window resulting in a feature that varies between 0 and 1.
Since features are created on windows, and connections (source to destination) are con-
sidered, it results in one feature vector for each connection in a window.
When proper features had been selected and validated and the system is running online
the feature processing is made online on the data stream instead of on a static file. The
input to the feature extraction function becomes the online real-time stream origin from
the edge device. Furthermore, the feature extraction as well as the machine learning
algorithm is running on the edge device when the system is running online, instead of
on the PC (Visual analyzer). See Figure 3.2 for an illustration of the process that goes
from packet collection to feature extraction from the data stream. The window size in
the figure is 5, however different window sizes are tested during validation to examine to
what extent the size of the windows influences the algorithm accuracy.
Figure 3.2: Data processing.
3.5 Creation of training and validation setsThe training of the clustering model is divided into two different scenarios. In the real
world, it is impossible to obtain data resulting from an attack before the attack has even
happened. This means that it is difficult to create a training set based on abnormal
26 Methodology
data. If the aim is to detect unseen attacks that result in behavior that has never been
seen before, a good idea is to train the model with only normal data and everything that
differs from the normal behavior is considered to be anomalies [16]. On the other hand,
it is interesting to see the results of a clustering model trained with both normal and
abnormal data. Since abnormal examples of traffic are simulated and available for this
project, it is possible to use that abnormal data for training. Having described these two
situations, this project considers two different scenarios.
Scenario A Clustering model is trained using features based on both normal and ab-
normal data.
Scenario B Clustering model is trained using features based on normal data, only.
In scenario A, where the model is trained using features extracted from normal and
abnormal operation, the validation set also consists of feature vectors extracted from
normal and abnormal data. Figure 3.3 describes how the generation of the training set
and validation set is performed. Features are extracted in the same format as described
in 3.4.3, where 90% are features derived from normal data and 10% are features extracted
from abnormal data. The dataset ends with the feature vector labels describing the data
as normal or abnormal. It must be clarified that the labels for the training set are not
used in training. The labels are only used to be able to validate the model.
3.5. Creation of training and validation sets 27
Figure 3.3: Training set and validation set for scenario A.
In scenario B, the distribution of the training and validation set looks different when
compared to scenario A. A feature data set is created containing only features from
normal data. 70% of these features go into the training set. The validation set is more
complicated. Here, we want to include features both labeled as normal and abnormal.
30% of the data set described above goes into the validation set and are thereby labeled as
normal. In order to have some observations labeled as abnormal as well in the validation
set, features are generated from a mix of x % normal data and (100 - x) % abnormal
data. Even though this mix contains observations from normal data it is still labeled as
abnormal. The motivation behind this procedure supposes a real-world attack. It might
not really matter how much abnormal data is included, it might still be malicious as long
as there is some abnormal data. Figure 3.4 illustrates the distribution of the training and
validation set when the machine learning model is trained only using features labeled as
normal.
28 Methodology
Figure 3.4: Training set and validation set for scenario B.
3.6 Implementation of machine learning algorithm
The machine learning part in this project is implemented using the same stream pro-
cessing engine sa.engine. Sa.engine has built-in support for various machine learning al-
gorithms, including DBSCAN. The step after feature extraction, as described in Section
3.4.3, is to train the DBSCAN model. The model is trained with the feature vectors from
the generated training set described in Section 3.5. The first step is to store these feature
vectors that are intended to be clustered by calling the built-in populate data( s ) method.
The method takes a stream s as an argument, where s is a stream of feature vectors from
the training set. Once it is done, the model is trained with the stored feature vectors
by calling conn dbscan( eps, minPts ) with the defined Eps and MinPts as arguments.
The prediction results are generated from a method called conn dbscan classify( v, eps,
minPts ). The argument v is a row in the data set, thus a feature vector.
3.7. Validation method 29
3.7 Validation method
3.7.1 Validation of clustering model
Evaluation of the implemented system will be based on how well the clustering algorithm
manages to classify abnormal and normal data. Evaluating clustering algorithms and un-
supervised methods, in general, is not completely straightforward. The aim of clustering
is to group patterns without predefined labels. Contradictory, to measure how well the
model categorizes normal and abnormal traffic, labels are needed. Not for training the
model, but for validation. As described in section 3.5, the clustering model is trained
in two different ways. Therefore, the validation methodology also looks different for the
two scenarios.
Figure 3.5: Confusion matrix describing relation between predicted and actual values.
The validation metrics are based on how well the clustering model distinguishes normal
data from abnormal. Four concepts need to be introduced here. True positives (TP),
true negatives (TN), false positives (FP) and false negatives (FN). In intrusion detection
systems, these concepts describe the relation between what the system detects (normal or
intrusion) for an analyzed event and its actual characteristics (innocuous or malicious).
This is described in Figure 3.5. TP is when a malicious event is correctly classified as
an intrusion. A TN is if an innocuous event is classified as normal. FP happens if an
innocuous event is classified as an intrusion, in other words a false alarm. If a detection
system classifies a malicious event as normal it is a FN. The objective of an intrusion
detection system is to have a low number of FP and FN, and a high amount of TP and
TN [8].
For scenario A, where the model is trained with both normal and abnormal data, the
concepts signify the following:
30 Methodology
• TP: Abnormal labeled features end up in an abnormal cluster or as an outlier.
• TN: Normal labeled features end up in a normal cluster.
• FP: Normal labeled features end up in an abnormal cluster or as an outlier.
• FN: Abnormal labeled features end up in a normal cluster.
The idea is that if the cluster contains points where 70% are labeled as normal in the
training set the cluster is considered to be normal otherwise it is an abnormal cluster.
This value is not precise and can be changed based on intuition, but it is decided that
70% is a reasonable value. For this experiment clusters in a trained model might be
points that are from abnormal data, that is the reason why some labeling of the clusters
has to be made which comes from the labels in the training set to be able to do the
validation.
For scenario B we train the model only using normal data. The definitions of TP, TN,
FP, FN are therefore slightly different in this case. The following significations apply:
• TP: Abnormal labeled features end up as an outlier.
• TN: Normal labeled features end up in a cluster.
• FP: Normal labeled features end up as an outlier.
• FN: Abnormal labeled features end up in a cluster.
A normal feature vector is a feature vector constructed from normal data, while an
abnormal feature vector is one that is built on x % normal data and (100-x) % abnormal
data, as illustrated in Figure 3.4.
Three measures are used to measure the performance of the clustering model. These
measures are Recall, Precision and F1-score. Recall measures the TP rate and tells how
well the model classifies incoming malicious observations. The formula for Recall is:
Recall =TP
TP + FN(3.1)
One can claim, by looking at the formula, that a high recall means that the model is
good at classifying abnormal data correctly since FN will be low.
Precision, on the other hand, tells how well the model predicts normal observations.
The formula for Precision is:
Precision =TP
TP + FP(3.2)
A low number of FPs results in high precision and vice versa. The ideal scenario for an
intrusion detection system is to have as high recall and precision as possible. Since one of
the problem formulations in this thesis is if it is possible to detect abnormal data, recall
3.7. Validation method 31
is relevant to measure. In addition, since false alarms are a major problem in intrusion
detection systems, these need to be considered as well.
The last metric to be used is F1-score, which is described by the following equation:
F1 = 2 · Precision ·RecallPrecision+Recall
(3.3)
F1 score indicates the harmonic average of recall and precision. The best value is 1 and
the worst is 0.
The validation will be split into the two scenarios A and B. In both scenarios, the
DBSCAN model will be validated with different values of Eps, MinPts, and windows
size of the feature. Eps and MinPts will be varied to see what impact these values have
in DBSCAN. See Section 2.5.1 for info on what Eps and MinPts is.
For scenario B different values of the mix of normal and abnormal data will be used
for the validation set (value x in Figure 3.4). This will measure the model’s ability to
detect abnormal traffic when the data deviates with small steps. In other words, how
much abnormal data that is required in an attack to detect what is abnormal.
3.7.2 Validation of the online data stream case
One of the problem formulations is whether a machine learning algorithm can run on
a resource-constrained edge device. Therefore, the goal is to run the clustering model
on the edge device and detect abnormal traffic in the form of online data streams. The
validation will be based on how well the model classifies generated abnormal traffic as
malicious, thus recall will be measured. The network traffic will be generated in the same
way as described in section 3.3. The only difference is that normal data will be mixed
with abnormal data with x % normal data and (100-x) % abnormal. Features will be
extracted as shown in Figure 3.2 directly on the online data stream. The model will be
trained as in scenario B with only normal data. The training occurs on a PC, and then
the trained clustering model is deployed on the edge device. The selected parameters for
the clustering will be the ones that generate the best results for scenario B.
CHAPTER 4
Results
This chapter describes the results from the validation of the clustering model, both
when the model is trained with both abnormal and normal data and when it is trained
with normal data, only. Results when the solution runs on online data streams are also
presented.
4.1 Clustering method
4.1.1 Scenario A
Figure 4.1 presents the results from the validation of the clustering model when the model
has been trained with both normal and abnormal features. As can be seen, the choice
of Eps and MinPts is of great importance when it comes to designing the model. One
interesting observation is that the precision becomes worse with smaller Eps and greater
MinPts when extracting features of window size 1000, 2000 and 4000 packets. Thus, the
number of false alarms increases if Eps is low and MinPts is high. This might be because
regardless of the label of the incoming observation, any observation becomes an outlier.
The same pattern can be seen for 500 packet windows but not as clearly. By comparing
windows sizes, one can see that the precision is better with smaller window size. An
interruption in the line indicates a NaN (not a number). The reason why precision is
NaN in some cases is when TP+FP becomes zero, hence division by zero in the precision
equation. This happens when all incoming observations are classified as normal, and
therefore no TP or FP occurs. A high value of Eps can lead to that all points going into
normal clusters.
The recall does not have the same impact from Eps and MinPts. On the other hand
one can see that it gets better and better the bigger the window size becomes. When the
window size is 2000 and 4000 the recall gets 100%, resulting in no undetected malicious
behavior. Window sizes greater than 4000 are not tested in this project. The reason is
that a too large window size would result in a few numbers of feature vectors, because
33
34 Results
of the relatively low number of rows in the data set. However, this investigation can
continue in future work.
All the results for scenario A are available as tables in Appendix A.
4.1.2 Scenario B
The results from the model, when training was made with only normal data, can be seen in
Figure 4.2 and 4.3. Figure 4.2 shows Recall, Precision and F1 score when x% normal data
and (100-x)% abnormal data is used in the feature vectors labeled as abnormal. Again
refer to Figure 3.4 for an illustration of these definitions. The features are extracted from
windows with size 4000. As can be seen, the precision is 100% in most of the cases when
Eps is low. Same as in scenario A, NaN means that every observation gets classified as
normal, meaning there are no abnormal labeled features that get classified as outliers,
nor any normal labeled features that become an outlier. This results in that all abnormal
features lands in a cluster resulting in a lot of FNs, which is not good. When 30% of the
abnormal traffic comes from abnormal data, the precision is 100% regardless of Eps and
MinPts, meaning that no false alarms occur.
The recall results are not promising when only 5% is from abnormal data in the ab-
normal labeled features, where the best recall is 32.5%. For 10% abnormal data it is
85.71%, still not a hundred percent trustworthy. However, when 20% is abnormal data,
the clustering method manages to get 100% recall with Eps 0.008 and MinPts 8. This
means that when at least 20% of all Profinet packets in a network are malicious the de-
tection method manages to detect the attack in every case. Since the detection method
considers connections and analyzes all traffic on the network this could mean that either
an attacker completely hijacks 20 out of 100 machines and sends 100% malicious data to
each of these 20 machines. Or it could be that the Profinet traffic to every 100 machines
contains 20% packets from the attacker and 80% packets from the normal operation.
Figure 4.3 shows the importance of selecting a proper window size. It shows a com-
parison when the window size is changed and the ratio of abnormal and normal data is
fixed. As can be noticed the bigger window size the better Recall. This means that the
model is better to detect attacks if the window size is bigger. To see a visualization of
this Figure 4.4 shows a scatter plot of the selected features for window size 500 and 4000
when 90% in the abnormal traffic is from normal data. As one can see, the distinction
between normal and abnormal is cleaner and therefore the model is better at classifying
abnormal traffic correctly.
Comparing scenario A and B for the best case when the window size is 4000 and Eps
and MinPts adapted values. One can see that there is no difference between precision
and recall. They are both 100%.
All the results for scenario B are available as tables in Appendix A.
4.1. Clustering method 35
Table 4.1: Results for Scenario B on online data streams when changing x% normal traffic.
Window size: 4000 packets
95 % normal 90 % normal 80 % normal
Recall 57,1 % 71,4 % 100 %
4.1.3 Online data stream case
The stream engine platform, which includes support for machine learning was installed
on the device without issues. The device did not have any problems to handle a 7 MB
footprint. The trained model was deployed on the device where the analytics were running
on online data streams without obstacles. When the clustering model was running online
on the edge device, the results from the recall looks like in Table 4.1. The selected
parameters for the clustering model are Eps = 0.008 and MinPts = 8. The reason these
parameters are selected is because they give the most promising result in scenario B. The
recall looks rather similar to when the model was running offline. When the generated
traffic consists of 20% abnormal data, the detection method classifies the traffic as an
attack 100 times out of 100. This is a promising result. Only window size 4000 is
presented since it gives the best result when it comes to recall. However, when running
with smaller windows, the detection system is able to send a faster response, in the form
of an alarm, since it takes less time to calculate features for smaller windows in the data
stream.
36 Results
Figure 4.1: Results for Scenario A.
4.1. Clustering method 37
Figure 4.2: Results for Scenario B when changing x% normal traffic.
38 Results
Figure 4.3: Results for Scenario B when changing window size.
Figure 4.4: Scatter plot of the selected features over different window sizes, normalized between
0 and 1.
CHAPTER 5
Discussions, Conclusions andFuture Work
The goal of this project was to implement and evaluate a machine learning solution,
which aims to detect abnormal application data in industrial networks. In addition, the
solution should run locally on a resource-constrained edge device from HMS. Running the
analytics close to the industrial floor reduces the traffic that would have been required
when compared to the case of the analytics running in the cloud.
5.1 Discussions and Conclusions
The investigated questions in this thesis were related to anomaly detection in industrial
networks using a resource-constrained edge device. This section answers the research
questions stated in Section 1.3.
Q1 Can abnormal data in an industrial network be detected using Deep Packet Inspec-
tion and Machine Learning?
The results show that when features are extracted with a proper window size, it is
possible to separate normal from abnormal data, and thereby permitting the detec-
tion of abnormal observations. In this project, the network data is self-constructed,
this includes the appearance of normal and abnormal data. This may explain why
the recall and precision scores are high in the best case scenario. Because this work
was carried out using simulated data, it is not possible to ensure the performance
of the approach in all operated cases. The investigation of additional cases requires
a solution that is trained and validated when data is generated from a real indus-
trial environment. The goal of this implementation was that the detection system
should operate in the general case. General case, means that the method should
work not only on one case of an attack scenario but should work on all false-data
39
40 Discussions, Conclusions and Future Work
injection attacks for any kind of application in the industry. This means that the
unsupervised model should learn what is normal for a general use case, and detect
abnormal behavior by only looking at how the behavior of the incoming application
data is classified by the model. This required a solution that works for any case
without actually knowing the relevance of the network data. If the assumptions in
section 2.1.3 hold true in the real world, and the generated traffic in this project
is realistic, it can be concluded that it is possible to detect abnormal data in an
industrial network by using deep packet inspection and machine learning. This
solution applies only to the Profinet protocol.
Q2 To what extent, as it relates to performance, speed, and system footprint, can an
unsupervised anomaly detection algorithm be useful on a resource-constrained edge
device?
Detecting anomalies in real-time, independent of the use case, requires that the
processing of the incoming data occurs as close as possible to the source of the data.
The results show that there is no significant difference in performance when the
model is running online on the edge device compared to when the model is running
offline, when the adequate parameters for the clustering model and window size are
selected. This indicates that the selected unsupervised anomaly method can retain
its high detection rate even when the system is running on a resource-constrained
edge device with online data streams. The edge device did not have any problems to
handle the required footprint of the stream engine platform, including the machine
learning tools. As described in the results, the detection system provides with
frequent prediction responses when the window size is small. This indicates that
the selection of the number of individual packets to extract features from relates to
the systems ability to send a faster response to the model. Extracting features from
a small number of packets at a time does not require as much processing power as
extracting over bigger windows. This is a trade-off when deciding how many packets
to analyze at a time because at the same time bigger windows produce improved
prediction performance. An investigation of whether a supervised learning anomaly
detection method would have resulted in better or worse performance was not part
of this thesis. However, it can be concluded that an anomaly-detection approach
is preferable over a signature-based system where all signatures for known attacks
must be stored when it comes to storage since an anomaly-based approach does
not require to store signatures for attacks, but instead only makes decisions based
on the observed data.
Q3 Is the implementation of the solution feasible in an existent device available in the
market?
The requirements on a complete solution like the one implemented in this project
must be high to manage running in a real-world case on devices available in the
market. The system must have a good detection rate of abnormal data in order to
5.2. Future Work 41
not miss any potential attacks. At the same time, the system should not generate
too many false alarms, which reduces the confidence in the system. The physical
performance is also important, meaning the system should work with no crashes or
interruptions. Since the results show that the detection rate is good when proper pa-
rameters are selected and the physical performance is good when it comes to speed
and required storage space, this thesis work demonstrates that it is feasible to im-
plement a machine learning solution that runs on an existing resource-constrained
device available in the market. In addition, this solution is able to detect anomalies
in industrial networks according to the response of Q1. However, a full judgment
of a real-world scenario requires validation made on real-world data, which was not
available for this thesis work.
5.2 Future WorkAlthough the implemented solution resulted in good scores in the validation of the de-
tection method and performance, there still remains some future work to improve the
proposed solution. The investigation of different anomaly detection methods was not
entirely thorough. The only algorithm implemented and tested was DBSCAN. Some
research papers bring up One-Class support vector machine as a potential alternative
for unsupervised anomaly detection [16]. An investigation of whether One-Class support
vector machine would generate better results would be interesting to evaluate. DEN-
STREAM is another algorithm that would be interesting to evaluate. Both of these
algorithms are supported in the platform used in this work, but time constraints did not
permit to test them. Another aspect that is left for future work is to train and validate
the model on a real-world Profinet network to see if the validation measurement of the
model provides similar results. This is required to conclude whether the implemented
solution is sufficient for use in a real-world environment. The results from a real-world
case would also confirm whether the assumptions stated in Section 3.3 apply to real-world
environments. As discussed in Section 2.1.3, it is possible the existence of a case where
the anomaly-based intrusion detection system identifies an anomaly that might not be
an attack even though they are anomalies according to the model. A study should be
made as future work to measure how many of the correctly classified anomalies are ac-
tually attacks. This would require data generated from real-world attacks, which was
not available for this thesis work. Future work also includes an investigation of greater
window sizes when extracting features, to see whether the results get better or eventually
begin to decrease. Finally, it can be useful to start building a database with labeled data
of different types of attacks to test supervised learning machine learning algorithms and
compare the results with those in this work.
APPENDIX A
Tables
43
44
Table A.1: Results for Scenario A with window size: 500 packets.
Window size: 500 packets
Precision Recall F-score Eps MinPts
NaN 0 % 0 % 0,05 2
100 % 87,5 % 93,3 % 0,025 2
100 % 100 % 100 % 0,01 2
100 % 100 % 100 % 0,005 2
100 % 100 % 100 % 0,003 2
100 % 100 % 100 % 0,002 2
100 % 12,5 % 22,2 % 0,05 4
100 % 87,5 % 93,3 % 0,025 4
100 % 100 % 100 % 0,01 4
100 % 100 % 100 % 0,005 4
100 % 100 % 100 % 0,003 4
100 % 100 % 100 % 0,002 4
100 % 25 % 40 % 0,05 8
100 % 87,5 % 93,3 % 0,025 8
100 % 100 % 100 % 0,01 8
100 % 100 % 100 % 0,005 8
100 % 100 % 100 % 0,003 8
80 % 100 % 88,9 % 0,002 8
100 % 37,5 % 54,5 % 0,05 16
100 % 87,5 % 93,3 % 0,025 16
80 % 100 % 88,9 % 0,01 16
80 % 100 % 88,9 % 0,005 16
80 % 100 % 88,9 % 0,003 16
72,7 % 100 % 84,2 % 0,002 16
45
Table A.2: Results for Scenario A with window size: 1000 packets.
Window size: 1000 packets
Precision Recall F-score Eps MinPts
NaN 0 % 0 % 0,05 2
100 % 100 % 100 % 0,025 2
100 % 100 % 100 % 0,01 2
66,7 % 100 % 80 % 0,005 2
66,7 % 100 % 80 % 0,003 2
66,7 % 100 % 80 % 0,002 2
NaN 0 % 0 % 0,05 4
100 % 100 % 100 % 0,025 4
66,7 % 100 % 80 % 0,01 4
66,7 % 100 % 80 % 0,005 4
66,7 % 100 % 80 % 0,003 4
66,7 % 100 % 80 % 0,002 4
NaN 0 % 0 % 0,05 8
100 % 100 % 100 % 0,025 8
40 % 100 % 57,1 % 0,01 8
40 % 100 % 57,1 % 0,005 8
40 % 100 % 57,1 % 0,003 8
40 % 100 % 57,1 % 0,002 8
100 % 100 % 100 % 0,05 16
100 % 100 % 100 % 0,025 16
13,8 % 100 % 24,2 % 0,01 16
13,8 % 100 % 24,2 % 0,005 16
13,8 % 100 % 24,2 % 0,003 16
11,4 % 100 % 20,5 % 0,002 16
46
Table A.3: Results for Scenario A with window size: 2000 packets.
Window size: 2000 packets
Precision Recall F-score Eps MinPts
100 % 100 % 100 % 0,05 2
100 % 100 % 100 % 0,025 2
100 % 100 % 100 % 0,01 2
50 % 100 % 66,7 % 0,005 2
50 % 100 % 66,7 % 0,003 2
50 % 100 % 66,7 % 0,002 2
100 % 100 % 100 % 0,05 4
100 % 100 % 100 % 0,025 4
100 % 100 % 100 % 0,01 4
50 % 100 % 66,7 % 0,005 4
50 % 100 % 66,7 % 0,003 4
50 % 100 % 66,7 % 0,002 4
100 % 100 % 100 % 0,05 8
100 % 100 % 100 % 0,025 8
50 % 100 % 66,7 % 0,01 8
12,5 % 100 % 22,2 % 0,005 8
12,5 % 100 % 22,2 % 0,003 8
12,5 % 100 % 22,2 % 0,002 8
100 % 100 % 100 % 0,05 16
100 % 100 % 100 % 0,025 16
33,3 % 100 % 50 % 0,01 16
7,7 % 100 % 14,3 % 0,005 16
7,7 % 100 % 14,3 % 0,003 16
7,7 % 100 % 14,3 % 0,002 16
47
Table A.4: Results for Scenario A with window size: 4000 packets.
Window size: 4000 packets
Precision Recall F-score Eps MinPts
100 % 100 % 100 % 0,05 2
100 % 100 % 100 % 0,025 2
100 % 100 % 100 % 0,01 2
100 % 100 % 100 % 0,005 2
100 % 100 % 100 % 0,003 2
20 % 100 % 33,3 % 0,002 2
100 % 100 % 100 % 0,05 4
100 % 100 % 100 % 0,025 4
100 % 100 % 100 % 0,01 4
100 % 100 % 100 % 0,005 4
66,7 % 100 % 80 % 0,003 4
16,7 % 100 % 28,6 % 0,002 4
100 % 100 % 100 % 0,05 8
100 % 100 % 100 % 0,025 8
100 % 100 % 100 % 0,01 8
25 % 100 % 40 % 0,005 8
20 % 100 % 33,3 % 0,003 8
14,3 % 100 % 25 % 0,002 8
100 % 100 % 100 % 0,05 16
100 % 100 % 100 % 0,025 16
25 % 100 % 40 % 0,01 16
14,3 % 100 % 25 % 0,005 16
14,3 % 100 % 25 % 0,003 16
14,3 % 100 % 25 % 0,002 16
48
Table A.5: Results for Scenario B with window size: 4000 packets and 95 % normal traffic.
Window size: 4000 packets — 95 % normal
Precision Recall F-score Eps MinPts
NaN 0 % 0 % 0,05 2
NaN 0 % 0 % 0,025 2
100 % 15 % 26,1 % 0,01 2
100 % 25 % 40 % 0,008 2
NaN 0 % 0 % 0,05 4
NaN 0 % 0 % 0,025 4
100 % 15 % 26,1 % 0,01 4
100 % 25 % 40 % 0,008 4
NaN 0 % 0 % 0,05 6
NaN 0 % 0 % 0,025 6
100 % 15 % 26,1 % 0,01 6
100 % 25 % 40 % 0,008 6
NaN 0 % 0 % 0,05 8
NaN 0 % 0 % 0,025 8
100 % 20 % 33,3 % 0,01 8
100 % 32,5 % 49,1 % 0,008 8
49
Table A.6: Results for Scenario B with window size: 4000 packets and 90 % normal traffic.
Window size: 4000 packets — 90 % normal
Precision Recall F-score Eps MinPts
NaN 0 % 0 % 0,05 2
100 % 2,4 % 4,7 % 0,025 2
100 % 47,6 % 64,5 % 0,01 2
100 % 71,4 % 83,3 % 0,008 2
NaN 0 % 0 % 0,05 4
100 % 4,8 % 9,1 % 0,025 4
100 % 47,6 % 64,5 % 0,01 4
100 % 71,4 % 83,3 % 0,008 4
NaN 0 % 0 % 0,05 6
100 % 4,8 % 9,1 % 0,025 6
100 % 57,1 % 72,7 % 0,01 6
100 % 76,2 % 86,5 % 0,008 6
NaN 0 % 0 % 0,05 8
100 % 4,8 % 9,1 % 0,025 8
100 % 61,9 % 76,5 % 0,01 8
100 % 85,7 % 92,3 % 0,008 8
50
Table A.7: Results for Scenario B with window size: 4000 packets and 80 % normal traffic.
Window size: 4000 packets — 80 % normal
Precision Recall F-score Eps MinPts
NaN 0 % 0 % 0,05 2
100 % 29,2 % 45,2 % 0,025 2
100 % 91,7 % 95,7 % 0,01 2
100 % 95,8 % 97,9 % 0,008 2
NaN 0 % 0 % 0,05 4
100 % 29,2 % 45,2 % 0,025 4
100 % 91,7 % 95,7 % 0,01 4
100 % 95,8 % 97,9 % 0,008 4
NaN 0 % 0 % 0,05 6
100 % 33,3 % 50 % 0,025 6
100 % 91,7 % 95,7 % 0,01 6
100 % 95,8 % 97,9 % 0,008 6
NaN 0 % 0 % 0,05 8
100 % 33,3 % 50 % 0,025 8
100 % 95,8 % 97,9 % 0,01 8
100 % 100 % 100 % 0,008 8
51
Table A.8: Results for Scenario B with window size: 4000 packets and 70 % normal traffic.
Window size: 4000 packets — 70 % normal
Precision Recall F-score Eps MinPts
100 % 7,4 % 13,8 % 0,05 2
100 % 77,8 % 87,5 % 0,025 2
100 % 100 % 100 % 0,01 2
100 % 100 % 100 % 0,008 2
100 % 7,4 % 13,8 % 0,05 4
100 % 77,8 % 87,5 % 0,025 4
100 % 100 % 100 % 0,01 4
100 % 100 % 100 % 0,008 4
100 % 7,4 % 13,8 % 0,05 6
100 % 81,5 % 89,8 % 0,025 6
100 % 100 % 100 % 0,01 6
100 % 100 % 100 % 0,008 6
100 % 7,4 % 13,8 % 0,05 8
100 % 81,5 % 89,8 % 0,025 8
100 % 100 % 100 % 0,01 8
100 % 100 % 100 % 0,008 8
52
Table A.9: Results for Scenario B with window size: 4000 packets and 50 % normal traffic.
Window size: 4000 packets — 50 % normal
Precision Recall F-score Eps MinPts
100 % 39,1 % 56,3 % 0,05 2
100 % 100 % 100 % 0,025 2
100 % 100 % 100 % 0,01 2
100 % 100 % 100 % 0,008 2
100 % 43,5 % 60,6 % 0,05 4
100 % 100 % 100 % 0,025 4
100 % 100 % 100 % 0,01 4
100 % 100 % 100 % 0,008 4
100 % 43,5 % 60,6 % 0,05 6
100 % 100 % 100 % 0,025 6
100 % 100 % 100 % 0,01 6
100 % 100 % 100 % 0,008 6
100 % 47,8 % 64,7 % 0,05 8
100 % 100 % 100 % 0,025 8
100 % 100 % 100 % 0,01 8
100 % 100 % 100 % 0,008 8
53
Table A.10: Results for Scenario B with window size: 4000 packets and 10 % normal traffic.
Window size: 4000 packets — 10 % normal
Precision Recall F-score Eps MinPts
100 % 100 % 100 % 0,05 2
100 % 100 % 100 % 0,025 2
100 % 100 % 100 % 0,01 2
100 % 100 % 100 % 0,008 2
100 % 100 % 100 % 0,05 4
100 % 100 % 100 % 0,025 4
100 % 100 % 100 % 0,01 4
100 % 100 % 100 % 0,008 4
100 % 100 % 100 % 0,05 6
100 % 100 % 100 % 0,025 6
100 % 100 % 100 % 0,01 6
100 % 100 % 100 % 0,008 6
100 % 100 % 100 % 0,05 8
100 % 100 % 100 % 0,025 8
100 % 100 % 100 % 0,01 8
100 % 100 % 100 % 0,008 8
54
Table A.11: Results for Scenario B with window size: 500 packets and 90 % normal traffic.
Window size: 500 packets — 90 % normal
Precision Recall F-score Eps MinPts
NaN 0 % 0 % 0,05 2
100 % 9,5 % 17,3 % 0,025 2
100 % 42,2 % 59,4 % 0,01 2
100 % 52,3 % 68,7 % 0,008 2
NaN 0 % 0 % 0,05 4
100 % 9,8 % 17,8 % 0,025 4
100 % 44 % 61,1 % 0,01 4
100 % 53,2 % 69,4 % 0,008 4
NaN 0 % 0 % 0,05 6
100 % 10,3 % 18,8 % 0,025 6
100 % 44,5 % 61,6 % 0,01 6
100 % 54,9 % 70,9 % 0,008 6
NaN 0 % 0 % 0,05 8
100 % 10,9 % 19,7 % 0,025 8
100 % 45,1 % 62,2 % 0,01 8
100 % 55,2 % 71,1 % 0,008 8
55
Table A.12: Results for Scenario B with window size: 1000 packets and 90 % normal traffic.
Window size: 1000 packets — 90 % normal
Precision Recall F-score Eps MinPts
NaN 0 % 0 % 0,05 2
100 % 5,7 % 10,9 % 0,025 2
97,8 % 52,3 % 68,2 % 0,01 2
98,2 % 62,1 % 76,1 % 0,008 2
NaN 0 % 0 % 0,05 4
100 % 5,7 % 10,9 % 0,025 4
97,8 % 53,4 % 69,1 % 0,01 4
98,2 % 62,6 % 76,5 % 0,008 4
NaN 0 % 0 % 0,05 6
100 % 5,7 % 10,9 % 0,025 6
97,9 % 54,6 % 70,1 % 0,01 6
98,2 % 64,4 % 77,8 % 0,008 6
NaN 0 % 0 % 0,05 8
100 % 6,3 % 11,9 % 0,025 8
94,2 % 56,3 % 70,5 % 0,01 8
95,1 % 66,7 % 78,4 % 0,008 8
56
Table A.13: Results for Scenario B with window size: 2000 packets and 90 % normal traffic.
Window size: 2000 packets — 90 % normal
Precision Recall F-score Eps MinPts
NaN 0 % 0 % 0,05 2
NaN 0 % 0 % 0,025 2
100 % 44,2 % 61,3 % 0,01 2
100 % 55,8 % 71,6 % 0,008 2
NaN 0 % 0 % 0,05 4
100 % 1,2 % 2,3 % 0,025 4
100 % 51,2 % 67,7 % 0,01 4
100 % 58,1 % 73,5 % 0,008 4
NaN 0 % 0 % 0,05 6
100 % 1,2 % 2,3 % 0,025 6
100 % 53,5 % 69,7 % 0,01 6
100 % 62,8 % 77,1 % 0,008 6
NaN 0 % 0 % 0,05 8
100 % 4,7 % 8,9 % 0,025 8
96 % 55,8 % 70,6 % 0,01 8
96,7 % 67,4 % 79,5 % 0,008 8
REFERENCES
[1] J. Kennedy, “Stuxnet worm hits iran nuclear plant staff computers.”
https://www.siliconrepublic.com/enterprise/cyber-attack-stuxnet-worm-hits-
iranian-nuclear-plant, January 2010. Accessed: 2019-04-16.
[2] P. Mueller and B. Yadegari, “The stuxnet worm,” Departement des sciences de
linformatique, University of Arizona, 2012.
[3] D. Ding, Q. Han, Y. Xiang, X. Ge, and X.M.Zhang, “A survey on security con-
trol and attack detection for industrial cyber-physical systems,” Neurocomputing,
vol. 275, pp. 1674–1683, 2018.
[4] Y. Mo and B. Sinopoli, “False data injection attacks in control systems,” Preprints
of the 1st workshop on Secure Control Systems, pp. 1–6, 2010.
[5] A. Meshram and C. Haas, “Anomaly detection in industrial networks using machine
learning: A roadmap,” in Machine Learning for Cyber Physical Systems, (Berlin),
pp. 65–72, Springer Vieweg, 2016.
[6] Y. Hu, A. Yang, and H. L, “A survey of intrusion detection on industrial control
systems,” International Journal of Distributed Sensor Networks, vol. 14, 2018.
[7] A. Paul, F. Schuster, and H. Konig, “Towards the protection of industrial con-
trol systems - conclusions of a vulnerability analysis of Profinet IO,” in Detection
of Intrusions and Malware, and Vulnerability Assessment, (Berlin), pp. 160–176,
Springer, 2013.
[8] P. Garcia-Teodoro, J. Diaz-Verdejo, G. Macia-Fernandez, and E. Vazquez,
“Anomaly-based network intrusion detection: Techniques, systems and challenges,”
Computers security, vol. 28, pp. 18–28, 2009.
[9] F. Schuster, A. Paul, and H. Konig, “Towards learning normality for anomaly de-
tection in industrial control networks,” in Emerging Management Mechanisms for
the Future Internet, (Berlin), pp. 61–72, Springer, 2013.
57
58
[10] “Profinet - the leading industrial ethernet standard.”
https://www.profibus.com/technology/profinet/. Accessed: 2019-04-16.
[11] E. Ahmed, A. Ahmed, I. Yaqoob, J. Shuja, A. Gani, M. Imran, and M. Shoaib,
“Bringing computation closer towards user network: Is edge computing the solu-
tion?,” IEEE Communications Magazine, 2017.
[12] H. J. Liao, C. H. R. Lin, Y. C. Lin, and K. Tung, “Intrusion detection system:
A comprehensive review,” Journal of Network and Computer Applications, vol. 36,
pp. 16–24, 2013.
[13] C. Wressnegger, A. Kellner, and K. Rieck, “Zoe: Content-based anomaly detection
for industrial control systems,” in 2018 48th Annual IEEE/IFIP International Con-
ference on Dependable Systems and Networks (DSN), (Luxembourg City), pp. 127–
138, 2018.
[14] M. Mantere, M. Sailio, and S. Noponen, “Network traffic features for anomaly de-
tection in specific industrial control system network,” Future Internet, pp. 460–473,
2013.
[15] G. S. Sestito, A. C. Tucato, A. L. Dias, M. S. Rocha, M. M. da Silva, P. Ferrari,
and D. Brandao, “A method for anomalies detection in real-time ethernet data
traffic applied to profinet,” IEEE Transactions on Industrial Informatics, vol. 14,
pp. 2171–2180, 2018.
[16] F. Schuster, A. Paul, R. Rietz, and H. Koenig, “Potentials of using one-class svm for
detecting protocol-specific anomalies in industrial networks,” in 2015 IEEE Sympo-
sium Series on Computational Intelligence, (Cape Town), 2015.
[17] P. Mulinka and P. Casas, “Stream-based machine learning for network security and
anomaly detection,” in Proceedings of the 2018 Workshop on Big Data Analytics and
Machine Learning for Data Communication Networks, (New York), pp. 1–7, 2018.
[18] Profinet, “Profinet technology and application - system description.”
https://www.profibus.com/index.php?eID=dumpFile&t=f&f=82430&token=
7cbb78f5ba6b3e17762ab594f803f1901eb24fdf, November 2018. Accessed: 2019-04-
18.
[19] P. Thomas, “An introduction to profinet frame analysis using wireshark.”
https://profibusgroup.files.wordpress.com/2013/01/w4-profinet-frame-analysis-
peter-thomas.pdf, May 2013. Accessed: 2019-04-18.
[20] Profinet, “Profinet system description - open solution for the world of automation.”
https://www2.mmu.ac.uk/media/mmuacuk/content/documents/ascent/
B01 PROFINET system en.pdf, September 2010. Accessed: 2019-04-18.
59
[21] Z. Lin and S. Pearson, “An inside look at industrial ethernet communication pro-
tocols.” http://www.ti.com/lit/wp/spry254b/spry254b.pdf, July 2018. Accessed:
2019-04-19.
[22] J. Scheible and A. Lu, “Anomaly detection on the edge,” in MILCOM 2017 - 2017
IEEE Military Communications Conference (MILCOM), (Baltimore), IEEE, 2017.
[23] J. Zhang and M. Zulkernine, “Anomaly based network intrusion detection with
unsupervised outlier detection,” in 2006 IEEE International Conference on Com-
munications, (Instanbul), 2006.
[24] L. Golab and M. T. Ozsu, “Issues in data stream management,” ACM Sigmod
Record, vol. 32, pp. 5–14, 2003.
[25] C. Xu, Scalable Validation of Data Streams. PhD thesis, Uppsala Universitet, Upp-
sala, 2016.
[26] P. Domingos, “A few useful things to know about machine learning,” Communica-
tions of the ACM, pp. 78–87, 2012.
[27] I. Souiden, Z. Brahmi, and H. Toumi, “A survey on outlier detection in the context of
stream mining: review of existing approaches and recommadations,” in International
Conference on Intelligent Systems Design and Applications, pp. 372–383, Springer,
Cham, 2016.
[28] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM
Computing Surveys, pp. 264–323, 1999.
[29] R. Xu and D. Wunsch, “Survey of clustering algorithms,” IEEE Transactions on
Neural Networks, vol. 16, pp. 645–678, 2005.
[30] K. Leung and C. Leckie, “Unsupervised anomaly detection in network intrusion de-
tection using clusters,” in Proceedings of the Twenty-eighth Australasian conference
on Computer Science, vol. 38, (Newcastle), pp. 333–342, 2005.
[31] Y. Chen and L. Tu, “Density-based clustering for real-time stream data,” in Pro-
ceedings of the 13th ACM SIGKDD international conference on Knowledge discovery
and data mining, (New York), pp. 133–142, ACM, 2007.
[32] E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “Dbscan revisited, revis-
ited: Why and how you should (still) use dbscan,” ACM Transactions on Database
Systems, vol. 42, pp. 1–21, 2017.
[33] M. Baud and M. Felser, “Profinet io-device emulator based on the man-in-the-middle
attack,” in Emerging Technologies and Factory Automation, (Prague), 2006.
60
[34] M. Thottan, G. Liu, and C. Ji, “Anomaly detection approaches for communication
networks,” in Algorithms for Next Generation Networks, (London), pp. 239–261,
2010.
[35] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM
computing surveys (CSUR), p. 15, 2009.
[36] P. Laskov, P. Dussel, C. Schafer, and K. Rieck, “Learning intrusion detection: Super-
vised or unsupervised?,” in Image Analysis and Processing - ICIAP 2005, (Berlin),
pp. 50–57, 2005.
[37] M. Goldstein and S. Uchida, “A comparative evaluation of unsupervised anomaly
detection algorithms for multivariate data,” PloS one, p. 15, 2016.
[38] S. Omar, A. Ngadi, and H. H. Jebur, “Machine learning techniques for anomaly
detection: an overview,” International Journal of Computer Applications, vol. 79,
2013.
[39] R. H. Moulton, H. L. Viktor, N. Japkowicz, and J. Gama, “Clustering in the presence
of concept drift,” in Joint European Conference on Machine Learning and Knowledge
Discovery in Databases, pp. 339–355, 2018.
[40] F. Cao, M. Estert, W. Qian, and A. Zhou, “Density-based clustering over an evolving
data stream with noise,” in Proceedings of the 2006 SIAM international conference
on data mining, pp. 328–339, Society for Industrial and Applied Mathematics, 2006.
[41] A. T. Tran, “Network anomaly detection,” 2017.
[42] J. Hurwitz and D. Kirsch, Machine Learning For Dummies, IBM Limited Edition.
Wiley, 2018.