DATA DRIVEN ANOMALY CONTROL DETECTION FOR RAILWAY … · 2020. 11. 17. · Ajna Hod zi c D zenita Skulj Data Driven Anomaly Control Detection for Railway Propulsion Control Systems

Mälardalen UniversitySchool of Innovation Design and Engineering

Väster̊as, Sweden

Thesis for the Degree of Master of Science in Engineering -Embedded Systems 15.0 credits

DATA DRIVEN ANOMALY CONTROLDETECTION FOR RAILWAY

PROPULSION CONTROL SYSTEMS

Ajna Hodžić[email protected]

Dženita Š[email protected]

Examiner: Thomas NolteMälardalen University, Väster̊as, Sweden

Supervisor: Aida ČauševićMälardalen University, Väster̊as, Sweden

Claes Lindskog,Bombardier Transportation, Väster̊as, Sweden

June 9, 2020

Ajna HodžićDženita Škulj

Data Driven Anomaly Control Detectionfor Railway Propulsion Control Systems

Abstract

The popularity of railway transportation has been on the rise over the past decades, as it has beenable to provide safe, reliable, and highly available service. The main challenge within this domainis to reduce the costs of preventive maintenance and improve operational efficiency. To tackle thesechallenges, one needs to investigate and provide new approaches to enable quick and timely datacollection, transfer, and storage aiming at easier and faster analysis whenever needed.

In this thesis, we aim at enabling the monitoring and analysis of collected signal data from atrain propulsion system. The main idea is to monitor and analyze collected signal data gatheredduring the regular operation of the propulsion control unit or data recorded during the regular traintests in the real-time simulator. To do so, we have implemented a solution to enable train signaldata collection and its storage into a .txt and .CSV file to be further analyzed in the edge node andin the future connected to the cloud for further analysis purposes. In our analysis, we focus onidentifying signal anomalies and predicting potential failures using MathWorks tools. Two machinelearning techniques, unsupervised and supervised learning, are implemented. Additionally, in thisthesis, we have investigated ways of how data can be efficiently managed. We have also reviewedexisting edge computing solutions and anomaly detection approaches using a survey as a suitablemethod to identify relevant works within the state of the art.

i



Table of Contents

1 Introduction 1

2 Background 32.1 Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Edge Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Digital Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.5 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Problem Formulation 93.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Edge Computing - A Survey on Existing Approaches 114.1 Edge Computing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 Edge Computing in IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3 Edge Computing in Industrial Control Systems . . . . . . . . . . . . . . . . . . . . 134.4 Edge Computing in Railway Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Anomaly Detection in Control Systems - A Survey on Existing Approaches 155.1 Machine Learning in Context of Anomaly Detection . . . . . . . . . . . . . . . . . 15

5.1.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.1.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.1.3 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.2 Anomaly Detection in Industrial Control Systems . . . . . . . . . . . . . . . . . . 165.3 Anomaly Detection in Railway Systems . . . . . . . . . . . . . . . . . . . . . . . . 165.4 Anomaly Detection in Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 Overview 186.1 MSLL protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7 Our Approach 207.1 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

7.1.1 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247.1.2 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

8 Related Work 31

9 Conclusions 32

ii



List of Figures

1 An example of IoT architecture [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Layered model for cloud edge-based IoT services delivery [2] . . . . . . . . . . . . . 43 Typical digital signal processing setup [3] . . . . . . . . . . . . . . . . . . . . . . . 64 An example of anomalies in two-dimensional data set [4] . . . . . . . . . . . . . . 65 (a) Point anomalies, o1, o2 and O3 [5] (b) Contextual anomaly [6] (c) Collective

anomaly [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Steps for writing surveys on existing approaches . . . . . . . . . . . . . . . . . . . 107 Phases for conducting case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 The problem space of Edge Computing-based IoT [7] . . . . . . . . . . . . . . . . . 129 Global overview of RELIANCE project [8] . . . . . . . . . . . . . . . . . . . . . . . 1810 The physical connection of the server and client . . . . . . . . . . . . . . . . . . . . 2011 Activity diagram for Linux Client - Server communication . . . . . . . . . . . . . . 2112 An example of a text file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2213 An example of an excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2214 Position of our thesis in RELIANCE project . . . . . . . . . . . . . . . . . . . . . . 2215 Rd locomotive [9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2316 Phases in Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2417 (a) Evaluation of each principal component (b) The plot of the first and second

components of PCA for one data-set (c) Density of the data distribution explainedby first and second principal components . . . . . . . . . . . . . . . . . . . . . . . 25

18 (a) Real and predicted signal of current (b) Error between real values and predictedvalues of current (c) Maximum error between real and predicted values of current 26

19 (a) Real and predicted signal of current (b) Error between real values and predictedvalues of current . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26




23 Amplitude spectrum of current . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2824 (a) Scatter plot of Tree model predictions (b) Confusion matrix of Tree model . . . 2925 (a) Scatter plot of Naive Bayes predictions (b) Scatter plot of Ensemble predic-

tions(c) Confusion matrix of Naive Bayes model . . . . . . . . . . . . . . . . . . . . 30

iii



List of Abbreviations

AC Alternating CurrentDC Direct CurrentPCU Propulsion Control UnitIoT Internet of ThingsEC Edge ComputingIT Information TechnologyRAN Radio Access NetworkDSP Digital Signal ProcessorAD analog-to-digitalDA digital-to-analogML Machine LearningAI Artificial IntelligenceECG ElectrocardiogramFL Federated LearningLoRa Long RangeTDAP Train Data Analysis PlatformSDN Software Defined NetworksNFV Networks Function VirtualizationICS Industrial Control SystemsDCGAN Deep Convolutional Generative Adversarial NetworkWSN Wireless Sensor NetworkBT Bombardier TransportationPPC Propulsion ControlTCMS Train Control and Management SystemMSLL Multiple Session Low-latencyUDP User Datagram ProtocolIP Internet ProtocolRTS Real-Time SimulatorPCA Principal Component AnalysisKNN K Nearest NeighborSVM Support Vector Machine

iv



1 Introduction

Rail or train transportation is a way to provide transfer for both passengers and goods on a wheeledvehicle that is running on rails located on tracks. The transport is being run on already preparedflat surface guided by tracks. Over the past decades, this kind of transportation has been gainingpopularity due to the high level of safety, reliability, availability, and possibility to enable punctualand predictable transportation. It has been used for public transportation purposes, as well as forindustrial purposes.

The first steam-powered locomotive was built and presented in 1804. It took few decades beforethe introduction of the first electric locomotive in 1837 [10], leading to the construction of the firstelectric railway in 1879. The next achievement in this domain has been the first electric tram linebuilt by Siemens in 1881 near Berlin, Germany. It was powered by the 180 Direct Current (DC)voltage. In 1891, Charles Brown presented the first alternating current (AC) electric locomotive inZürich, Switzerland. In the following years, given that AC power showed great advantages such ashigh output and long power supply distances, the focus has been on the development of three-phaserailways in a few Central-European countries. The design of the overhead lines and possibilities tocontrol the motors’ working speed of three-phase railway was one of the drawbacks that needed tobe overcome. The solution was the usage of single-phase alternating current.

In Sweden, the first trial with electric traction was introduced in 1905, and lines are electrifiedwith single-phase alternating current, 15kV AC, and 16 23Hz. Today, countries use different elec-tric traction systems (i.e., input voltage, control, and traction motor), which makes internationalrailway traveling very difficult and trains need to be equipped with dual/multiple systems (i.e.,15kV AC, 16 23Hz, 25kV AC 50Hz, DC 750V or DC 3kV etc) [11].

Today, public railway transportation is becoming even more important with emerging envi-ronmental and climate problems such as pollution, city congestion, and energy consumption [12].However, the public transportation system has still some space for improvements. Vehicle failuresare rare but inevitable, and whenever they happen they cause problems such as potentially longtrain delays that lead to loss of time and money for both passengers and service providers. Thisresults in decreased reliability, efficiency, and availability. A way to deal with potential failures isinvesting in finding an appropriate approach to enable more predictive maintenance of trains inoperation.

In this thesis, we focus on enabling an approach for predictive maintenance and anomaly detec-tion for the train Propulsion Control Unit (PCU), with a case study coming from the BombardierTransportation MITRAC PCU. MITRAC propulsion control system is a Bombardier Transporta-tion product, used in railway vehicles, assumed to be one of the most competitive propulsion andcontrol products offered at the market at the moment. Equipment that Bombardier MITRACdelivers are the following [13]:

• Drives that provide the traction effort;

• Electronics to manage different types of communication on a train, as well as controlling thepropulsion system;

• Traction Converters to transform power from catenary overhead power line or diesel engines;

• Auxiliary Converters used to supply power to doors, lighting, air-conditioning, etc.

The main functionality of PCU is power processing and management of the entire system [14].Furthermore, PCU improves safety, reliability, and efficient operation of the vehicles, by monitoringthe system and detecting malfunctions of related equipment [15].

PCU collects a large amount of data from the system and systems’ environment. Sensorstransmit signals to the PCU, which stores the collected data until it is transferred to the datacenter, implying that it takes a lot of time and sometimes requires a manual effort. Moreover, offlinediagnostics are usually performed in data centers. In case the failure happens, it is usually verychallenging to detect system malfunction within usually limited time because, in most unwantedsituations, some data may be omitted. To tackle this challenge, it is necessary to enable automaticreal-time data analysis.

1



Cloud computing has been introduced as one way to automatically handle a large amount ofdata[16]. It is promised to provide scalability, the elasticity of resources, on-demand provisioning,and pay-as-you-go cost model. In this case, to be able to analyse data, the data should be sentand stored in the cloud. In systems like PCU, the amounts of gathered data might sometimesbe unsuitable for the cloud, thus edge computing is being introduced as an alternative way totackle this issue [17]. In order to achieve real-time data streams in such a system, the amountof data needs to be downsized. Edge computing enables the most efficient way of handling largeamounts of data and downsizing data in a sense of its ability to distinguish between important andless-important data [18]. To achieve the most efficient way of handling large amounts of data inthe edge, machine learning has been identified as a promising solution. It has been used in variousdomains where data has to be handled [19]. Within the edge computing domain, to process data,ML algorithms can be used for the selection of important information, therefore the total amountof data is significantly reduced.

In this thesis, we focus on investigation and understanding of existing edge computing solutionsand anomaly detection approaches using a survey as a suitable method to identify relevant workswithin the state of the art. Moreover, to provide means for the analysis we have implemented a wayfor data acquisition from PCU sensors and its transfer into the edge for analysis purposes, calledLinux Client. Finally, we have completed the approach by connecting Linux Client to MATLABin order to be able to use suitable Machine Learning techniques for anomaly detection within theprovided data set coming from the railway propulsion system.

The rest of the thesis is organized as follows. Section 2 introduces necessary background. InSection 3 we define the focus of our thesis and corresponding methods to answer the researchquestions. In Section 4 we deliver edge computing survey and explore the existing edge computingsystems related to our thesis. Section 5 is dedicated to the second survey on anomaly detection incontrol systems. In Section 6, we provide an explanation of the real system set up in which ourthesis has a contribution. Section 7 presents our approach and methods for detecting anomalousbehaviors of the current signal in the railway propulsion system as well as an extensive explanationof the results. Existing related work is presented in Section 8. The conclusions and potential futurework are presented in Section 9.

2



2 Background

A basic explanation of the key concepts used in this paper will be presented in this section. Section2.1 describes the Internet of Things and how the data is usually processed. Section 2.2 describesEdge computing and its comparison with Cloud computing. Section 2.3 briefly explains digitalsignal processing. Section 2.4 introduces anomaly detection in the signal domain, and Section 2.5describes machine learning algorithms as they are a useful tool for automated data analysis.

2.1 Internet of Things

The Internet of Things (IoT) is a global infrastructure that enables the physical and virtual inter-connection of information and communication technologies [20]. Nowadays, numerous devices areinterconnected via a network, and they can exchange the data with each other [21]. By combin-ing telecommunications, informatics, electronics, and social science, IoT enables the integration ofseveral technologies [22]. The IoT enables the integration of products and devices from differentfields of knowledge to achieve a common goal.

The IoT is a product of rapid growth of technology, and it is expanding in all areas of life. Theapplications of IoT are split into four groups, consumer, commercial, industrial, and infrastructurespaces [23]. Some products that IoT brings to us are smart vehicles, smart buildings, smart housedevices, etc [24]. IoT is increasingly used in control systems [25]. An example of IoT architectureis shown in Figure 1.

Figure 1: An example of IoT architecture [1]

Sensors are devices that gather data from their environment. This data holds a lot of informa-tion, but it requires further processing to provide useful inputs for the user. IoT devices usually donot support data processing as they have limited computational and energy resources. To resolvethis challenge, the data is offloaded in the cloud or data center [26]. After the computations aredone in the data center, the actuators receive the command that should be performed.

In an industry such as the railway, the number of sensors can be large, as well as the amountof data that they generate. Transmission of a large amount of data will occupy a lot of networkand bandwidth resources [7]. Another viable challenge is data transmission to the faraway datacenter that can cause significant network latency. To perform the real-time analysis of enormousamounts of data, the amount of data should be reduced and data transmission optimized. Onesolution for reducing the data is to downsample the data, but there is a risk to miss importantinformation and events [18]. To tackle this challenge, edge computing is introduced to reduce datatransmission time and perform efficient data analysis.

3



2.2 Edge Computing

Edge computing (EC) is a new paradigm that enables the extension of cloud capabilities by placingcomputing and storage resources at the edge of the network in close proximity to end devices such assensors, thus supporting a new variety of services and applications [27]. Cloud computing enablesthe processing and storage of large amounts of data and information in the cloud [28]. The deviceshave limited storage and computation capabilities, and they offload the data and computationaltasks in the cloud. The cloud may be far away from the devices, hence the service may not beprovided in real-time.

EC enables the processing and storage of huge amounts of data and information from devicesin a local area. EC has close proximity to the devices, and parts of the processing are movedto the edge [29]. Between data source and the cloud database, there is an edge device that canbe any device capable of processing the data [30]. These edge devices are also called edge nodes.These devices can be used either to take over some tasks from the user device, which is referredto as task offloading or to take parts of cloud datacenter tasks. Applications that require highbandwidth and low latency are processed near the data source. Figure 2 demonstrates a layeredmodel for cloud edge-based IoT service delivery [2]. This figure has the same layers as the basicEC structure [7]. IoT devices are end devices in EC. They are connected to edge gateway nodeswhere a lot of real-time data processing and analysis happens. The edge gateway nodes are alsoused as temporary data storage because they do not have a lot of storage resources. Furthermore,the edge nodes communicate with the cloud. In the cloud, long term data analysis takes placesince the cloud has a lot of computational resources to conduct complex data analysis. Also, thecloud provides long term data storage since it has a large storage capacity.

Figure 2: Layered model for cloud edge-based IoT services delivery [2]

The main attributes of EC are:

• Low latency and close proximity - Due to close proximity of devices and edge servers,end-to-end delay and response delay are reduced.

• Location awareness - Edge servers collect and process data gathered from devices on thebasis of their geographical location without sending it to the cloud.

• Network context awareness - Edge servers are enabled to acquire network context infor-mation in order to adapt and response accordingly to the varying network conditions anddevices, which results in optimized network resource utilization [2, 29].

Besides EC, other computing concepts exist, and they will be explained shortly to understandthe connections and differences between each.

4



Grid computing is the first computing concept that is introduced in the 1990s [28]. It refersto a group of computers with a primary machine that works in parallel to process the data. It isusually used for computationally expensive calculations that can be parallelized hence calculationscan be divided into tasks. After completing a task, each computer sends results to the primarymachine which can process those results and give the final result.

Cloud computing is a paradigm that refers to the high-capacity datacenter with computingcapabilities that can be accessed at any point in time from anywhere through the Internet [28].Different types of cloud computing mainly differ in their scales and security levels. In cloudcomputing, the resources are not accessed directly, but through something like service which hasa large number of resources and it allocates them on demand.

Mobile cloud computing is an extension of cloud computing that emerged with the riseof mobile devices [31]. Computationally demanding tasks are offloaded to the cloud from mobiledevices, meaning that inherits all advantages of cloud computing.

Fog computing is similar to EC and its main idea is to process the data before sending itto the cloud [32]. The computing resources are placed near the network edge which is the maindifference between these two computing concepts.

Multi-access Edge Computing is an extension of EC and it offers IT services and cloud-computing capabilities in the edge [33]. This concept utilizes Radio Access Network (RAN),which provides real-time radio network information suitable for optimization. Since the computingresources are near the end-user, the network latency is ultra-low.

2.3 Digital Signal Processing

Digital Signal Processing (DSP) covers the processing and modifying the signals in the digitizeddiscrete domain. Signals are digitally processed for filtering signals to be more accessible to analyze,compressing to generate faster transmission and simpler detection of errors [34]. Since numerouscalculations must be done for signal processing, DSP has high demands for fast computing. Today,there are various applications of DSP such as:

• Image processing,

• Video processing,

• Control systems,

• Seismology,

• Feature extraction, such as image understanding and speech recognition, etc.

Signals from real life are analog signals, therefore to be suitable for digital processing, signalsmust be sampled. Sampling analog signals is done with proper hardware, analog-to-digital con-verters. Typical digital signal processing setup is shown in Figure 3 [3]. The analog signal isconverted to digital signal with AD converter. Obtained signal is processed with digital signalprocessor (DSP). Since the applications require analog signals, the signal is now converted withDA converter. Finally, the analog signal is filtered with low-pass filter to smother the signal andreduce the fluctuations.

2.4 Anomaly Detection

In the domain of signals, an anomaly is a deviation from the expected values of signals. Therefore,anomaly detection is the identification of deviations or unusual patterns in the data [5]. Anomalydetection has been used in various fields such as fault detection in Industrial Control Systems,attack detection in Cyber Security Systems, detection of different types of disease in medicine, etc.Figure 4 shows an example of an anomaly in a two-dimensional data set where X and Y are thedata from two different sensors. Visualization of the data shows us the distribution of the datawhere we can see the points that deviate from others, and they are considered as anomalies.

Considering the various forms of anomalies that appear in the signals there are three maintypes to classify all anomalies [5]:

5



Figure 3: Typical digital signal processing setup [3]

Figure 4: An example of anomalies in two-dimensional data set [4]

1. Point anomalies

2. Contextual anomalies

3. Collective anomalies

Point anomaly, see Figure 5a, is the simplest type of anomaly that refers to one sample of thedata that deviates concerning the rest of the data [5].

Contextual anomaly, see Figure 5b, is a set of points that presents abnormal behavior when theset of points is considered concerning the context of appearing [35]. This means that set pointsin one context could be anomaly, while in other context is not anomaly.

6



(a)

(b) (c)

Figure 5: (a) Point anomalies, o1, o2 and O3 [5] (b) Contextual anomaly [6] (c) Collective anomaly[5]

Collective anomaly, see Figure 5c, is a set of points that presents abnormal behaviour when theset of points is considered with the rest of the data (entire set of the data) [35].

2.5 Machine Learning

Machine Learning (ML) is a term in Artificial Intelligence (AI) that refers to the machine automat-ically learning and self-improving based on experience, without human influence [36]. The rapidgrowth of technology puts high demands on fast processing and analyzing of data. ML is the mosteffective method in the fields of data analysis. It is often used to predict the future behavior of amachine/system using prior collected data. The phase where the ML model is created based onprior collected data is called the training phase, while the testing phase presents using the trainedmodel to process new data [37]. The most important step in the first phase of ML is featureselection [38].

ML algorithms are non-interactive algorithms. Usage of ML algorithms is wide, and in thisregard, there are different types of ML algorithms [39]:

1. Supervised learning

2. Unsupervised learning

3. Semi-supervised learning

4. Reinforcement learning

5. Transduction

7



6. Learning to learn

Supervised learning, unsupervised learning, and semi-supervised learning are more explained inSection 5.1.

One of the applications of using ML is in video games. For example, ML can be used for solvingmazes, where the main point is to find the shortest path to the goal. This is where Reinforcementlearning technique is applicable. Furthermore, ML is very useful in problems with fragmentarytheoretical knowledge, but an enormous amount of data and numerous measurements are needed[40]. Hence, ML algorithms can be used for the detection of patterns in signals, as well as predictionof future patterns. In this regard, ML found its way to be used in the field of anomaly detectionthat has been tackled by many researchers. For example, ML can be used to find anomalies inbiomedical signals such as ECG to detect heart rhythm disorders or to find if the person has canceror not. One more example of using ML is for detecting anomalies in industrial systems by applyingML algorithms on the data from different sensors.

8



3 Problem Formulation

One of the most important efforts in the railway industry is to decrease the cost and time neededfor maintenance while increasing the reliability and safety at the same time. There is a number ofresearch works that focus on proposing solutions to enable predictive maintenance and anomalydetection. The main challenge these solutions are facing is the problem of handling large amountsof data collected from sensors deployed on the train. In order to be able to detect failures timely,data analysis has to be performed. Due to the difficulty of running analysis with large amounts ofdata, it needs to be pre-processed and reduced to the most relevant one. Despite many efforts, weare still lacking the method that will in an efficient way extract valuable and important informationand enable analysis in order to get timely anomaly detection. Besides all of this, it is an importanttask to preserve and maintain the existing and expected functionality of control systems at thesame time.

To tackle above identified research challenges, in this thesis we aim at answering the followingresearch questions:

RQ1 Which existing approaches are addressing data acquisition in railway systems?

RQ2 Which existing approaches are addressing anomaly detection for recorded signals in railwaydomain?

RQ3 What are the prerequisites to implement EC based on the anomaly detection approach forMITRAC PCU?

RQ3a What type of analysis methodology can be deployed to enable anomaly detection?

3.1 Method

In this thesis, we have used two research methods, namely survey and case study. The first one isused to help us understand the state of the art with respect to the domain the identified challengesare related to, while the second one is used with the purpose to be able to analyse recorded signalsand enable anomaly detection.

• Survey focused on existing approaches concerning edge computing systems in general, IoT,industrial control systems, and railway systems, and survey on anomaly detection existingapproaches in control systems including ML approaches and review of anomaly detection innetworks and railway systems. The survey, as a research method, aims to collect standardizedinformation from a specific field [41]. To conduct the surveys we reviewed multiple papersby following the simplified protocol defined in [41] and [42]. The first step was writingresearch questions, and the second one was to conduct a reviewing protocol. This protocolincluded the search process, inclusion and exclusion criteria, data extraction, and synthesisof extracted data.

We started the survey on existing approaches in EC by searching for papers with keywordsedge computing. We realized that with this approach, we will collect a large number of papersin several EC domains. We decided to filter out these papers by adding new keywords to thesearch. Since the thesis fits into three domains, the following keywords IoT, industrial controlsystems, and railway systems are added to the search. We selected, two databases IEEE andSpringer, for paper selection, since we consider them as the most relevant databases. Byfiltering the papers this way, we significantly narrowed down the number of papers. At thispoint, we had more than 60 selected papers. In the next step, we had to select papers basedon their title, followed by abstract reading in the next step. Finally, we carried out a fullreading of 35 selected papers.

A survey on existing approaches in anomaly detection in control systems was conducted in asimilar way. The survey started with a search for papers with anomaly detection keywords,which has followed by adding new keywords industrial control systems, railway systems, andnetworks. Afterward, we narrowed down the gathered papers by selecting two databasesand filtering the papers based on their title and abstract. Filtering papers resulted in 27

9



relevant papers for full paper reading, related to the survey on anomaly detection. The flowof conducting both surveys is presented in Figure 6.

Figure 6: Steps for writing surveys on existing approaches

• Case study method enabled us to define a model and implementing the optimization algo-rithm needed, in order to detect anomalies in selected signals. Since the thesis tackles a realsystem, the choice of case study is a suitable method. A case study is an empirical methodthat uses multiple sources to gather evidence [43]. It starts with an analysis of the researchfield and continues with collecting the data.

The steps in the case study follow the guidelines by Runeson et al. [43] and is divided in fourphases:

1. Case study design - planning the case study;

2. Preparation for data collection - procedures for data collections are defined;

3. Collecting evidence - execution with data collection;

4. Analysis of collected data.

In the first phase, in collaboration with domain experts, we have decided which locomotivemodel to use for collecting the data, and signals of interest for implementation and testing.Also, we have defined principles for implementation, which enabled us to move to the secondphase.

In the second phase, we have implemented the Linux Client to enable data collection andmake it doable to analyze and process signals, as well as to detect anomalies.

In the third phase, we have used the Linux Client to collect the data in the RTS lab. Themain reason to use a simulation environment for data collection was due to the time limitationand cost of implementing our Linux Client in the real environment.

In the last phase of the case study, we have enabled connection to the MATLAB and we haveloaded the collected data to the MATLAB for the analysis purposes. Using the collected data,we trained two ML models based on two different learning techniques to detect anomaliesand get results. Finally, we did the evaluation of the gathered results. Figure 7 visualize theperformed steps with respect to the case study.

Figure 7: Phases for conducting case study

10



4 Edge Computing - A Survey on Existing Approaches

Since EC is an important emerging technology, there are many papers and articles concerning thistopic [27, 28, 30, 44, 45]. Since EC is connected to many areas and has many applications thatare relevant to this thesis, the EC survey on existing approaches will be divided into the followingsections.

Firstly, in Section 4.1, different edge computing systems are presented. Section 4.2 focuses ondifferent approaches in IoT applications that integrate EC. Furthermore, Section 4.3 summarizesexisting EC approaches in industry systems, mainly focusing on control, while Section 4.4 discusseshow EC has been used in railway systems so far.

4.1 Edge Computing Systems

There is a large number of edge computing systems [44]. They differ in application scenarios,end devices, computation architecture, features, etc. Since various challenges are present in ECsystems, many papers refer to them and tackle them differently. In this section different ECapplications, platforms, and challenges have been reviewed.

Yi et al. [46] present LAVEA, an EC platform that provides low-latency video analytics.Mobile-edge and inter-edge side design are considered. Task offloading and bandwidth allocationis mathematically optimized. Also, nearby edge nodes are leveraged to reduce overall task executiontime.

The paper [47] presents a forecasting system based on EC for smart homes. It focuses on short-term electricity demand forecasting. An intelligent home gateway is used to store the data on thecentral repository where the data is processed and analyzed. Essentially, the data is processed andanalyzed on the edge of the power grid. The proposed system offloads data computation to ECservers to save resources and response time and provide a new type of power supply relationship.

Zhang et al. [48] propose a real-time face recognition system based on EC. The proposed systemwas placed face recognition algorithm on the edge which enables real-time recognition.

In [49] the authors present the Qarnot, a new EC platform for the detection of acoustic eventsin smart-buildings. This platform introduces computing nodes called Q.rads that also serve asheaters. These heaters create a network that can perform analysis of air quality or the acousticdetection of fire alarms.

Federated learning (FL) algorithms, a deep learning algorithm designed for reducing privacyleakage, and their performance on EC systems are discussed in letter [50]. The authors compareasynchronous and synchronous FL algorithms in terms of model communication between devices.The results indicate that an asynchronous FL algorithm behaves better in edge systems.

Wang et al. [51] focused on the convergence bound of gradient-descent based FL algorithmsand a new control algorithm is proposed. This algorithm is based on the knowledge of convergencebound and determines the best ratio of local update and global aggregation to minimize the lossfunction under a fixed resource budget. The algorithm shows near to optimum performance indifferent data distributions.

Mogi et al. [52] present a load balancing method for IoT sensor system using multi-access edgecomputing. When the event with a maximum load at the time occurs, this method replaces thedata between servers.

In [53], the authors present a system model for analyzing the scalability and performance ofhuge city-scale hybrid edge cloud systems. The main objective of this paper is to provide knowledgefor selecting the right balance of edge and cloud resources for latency constrained applications. Theresults indicate that increasing edge resource capacity without increasing internetwork bandwidthmay increase network congestion and reduce system capacity.

The problem of optimal computational offloading policy is considered in [54]. The offloadingdecision is made based on various factors that create a highly dimensional problem. The authorsproposed a deep reinforcement learning algorithm for solving stochastic computation offloading.This algorithm combines a Q-function decomposition and a double deep Q-network algorithm.

Another paper [55] considers a stochastic computation task scheduling policy, differently it refersto MEC systems. The optimal scheduling policy is achieved with the one-dimensional algorithmbased on the average delay of each task and average power consumption at the mobile device.

11



4.2 Edge Computing in IoT

Since millions of IoT devices produce a huge amount of data that needs to be processed, EC hasbeen introduced to solve IoT challenges. Therefore, many researchers have been interested in thistopic. In [7], the authors showed that EC and IoT have similar characteristics. They have reviewedhow transmission, storage, and computation in IoT applications is improved with edge computing.The transmission time is reduced since the edge nodes are geographically distributed near the end-user. However, the storage space that the cloud provides is significantly better than the storagespace that edge nodes provide. Also, there is a security concern with edge node storage. Finally, thecloud has much more computational resources than edge nodes. However, the IoT devices do notrequire many computational resources, thus EC nodes can successfully satisfy the computationaldemands. Based on these three characteristics, the problem space is shown in Figure 8 has beencreated.

Figure 8: The problem space of Edge Computing-based IoT [7]

Gosh et al. in [19] considered merging edge and cloud computing technologies for analyzingIoT data. The main idea is to reduce the data in the edge using a deep learning approach and tofurther analyze the data using ML in the cloud. The proposed approach gave satisfying results,showing that the data reduction did not have a big impact on the results.

Another paper that concerns a deep learning approach in IoT and EC is [56]. The authorsintroduced an elastic model for different deep learning models. Also, they solved the schedulingproblem of deep learning tasks on the edge to optimize IoT applications.

The experimental evaluation of EC in mobile gaming is conducted in [26]. The authors focusedon current 3-D arcade games that combine augmented reality and sensor information such as userlocation. These games have to have very quick responses to be a success. The goal of this researchis to demonstrate that using EC can reduce response delay in complex 3-D environments, includingvirtual and augmented reality.

In [57], edge computing integration in Long Range (LoRa) protocol has been researched. Smartcities, industrial IoT, animal tracking, smart metering are some of the IoT applications that em-ploy LoRa, and that were analyzed for possible optimizations. The authors proposed genericarchitecture to utilize EC advantages in IoT-based applications.

The security and privacy issues related to IoT have been reviewed in [21, 58]. The securityfeatures of IoT are confidentiality, integrity, availability, identification and authentication, privacy,and trust. These features are a measure for privacy and security issues in IoT infrastructures thatintegrate with EC.

Ren et al. [59] propose IoT architecture based on transparent computing and present its benefitsand challenges. Transparent computing provides services at the edge of the network for lightweightIoT devices.

Also, there are several papers concerning Mobile EC in IoT, such as [60, 61, 62, 63].

12



4.3 Edge Computing in Industrial Control Systems

The edge computing has been used in various industrial applications, hence new term has appeared,industrial edge computing. It is used to describe the integration of communication, computation,and storage of resources in real-time applications [64]. The top three areas for industrial edgecomputing applications are the preventive maintenance of devices, quality control and optimizationfor process control, and product quality monitoring and optimization [64]. In this section, edgecomputing applications in industrial control systems have been reviewed and analyzed.

Qian et. al [65] propose an edge computing framework for real-time fault diagnosis and dynamiccontrol of rotating machines. To increase power density and efficiency many mechanical andelectrical components have been integrated. These components increase the complexity and failurerisks of the rotating machines, hence many sensors are integrated to monitor the condition of themachines. The authors designed an edge computing node to acquire one vibrating signal and threemotor-phase current signals. Six motor conditions can be classified using synchronous processingof the signals on the edge. The framework detects faults in real-time and takes control of the motorwhen the fault is detected.

Pallasch et al. [66] present a platform that combines industrial control with cloud technologies.This approach separates responsibilities and requirements into four levels. The first layer (fieldlayer) includes all physical components that perform actions and sense the data. The acquireddata is pushed to the edge layer, which acts as an interconnection for the cloud and field layer. Onthe edge level, the data is processed, spooled, and compressed. Another function of the edge layeris to pull the processed data from the cloud. The last layer is the environment layer and its mainpurpose is to provide an interface for visualization.

In [67], the authors designed a proportional-Integrator motion controller for permanent mag-netic DC motor integrated with IoT. The integration process consists of three levels: edge, gateway,and the cloud level. The preferred speed is sent from the cloud to the gateway level. The functionof the gateway level is to transmit the data between the edge and the cloud level. Computationsof output speed and integral absolute error performance index take place on the edge level. Themain goal of having an edge level is to improve latency, bandwidth, and energy consumption.

Skarin et al. [68] research mission-critical control at the edge. The mission-critical system istime and failure sensitive. The failures in such systems result in great loss. Therefore, this paperresearches the potential of merged IoT, 5G, and cloud to avoid such failures. The authors presentedan edge-cloud test-bed that imitates the mission-critical process. Model Predictive Controlleris deployed on test-bed to evaluate system properties. The main goal of the evaluation is toauthenticate application deployment in the edge, to verify dynamic reconfiguration during run-time, and to investigate the benefits of deploying mission-critical applications at the edge.

In [69], the authors discussed the cooperation of AI at the edge in industrial applications. IoTdevices continuously monitor events in industrial applications and transmit that data to the server.The main challenge of IoT devices is power consumption and short battery lifetime. The authorsaddressed this challenge by proposing a forward central dynamic and available approach thatoptimizes power and extends battery lifetime in AI-based IoT devices. This approach adjusts thepower level and duty cycle by using signal processing and fault diagnosis at acceptable reliabilityor packet loss ratio.

4.4 Edge Computing in Railway Systems

Edge computing is particularly suitable for transportation applications [70]. Data transmissionto the cloud in real-time is very challenging, thus it is logical to perform signal processing andcomputation at the edge. Tasks that request urgent results and tasks that do not require a widerange of data sources are performed at the edge, otherwise, the performance happens on the cloud.There is a small number of papers regarding edge computing frameworks in railway systems. Inthis section, the existing approaches have been analyzed and reviewed.

Chen et al. [71] present fault detection for traction control systems in high-speed trains usingedge computing framework. Long-term usage of equipment leads to inevitable faults and it ispreferable to remove them since endanger the safety of passengers. Due to the high number ofsensors on the high-speed trains and development of data analysis, it is possible to detect faults inreal-time. However, high-speed trains produce a huge amount of data in real-time. For example,

13



one high-speed train with more than 1000 sensors produces about 50Mb of data in one second andtransfers it to the cloud. This is a challenge for the control center that has to make an onlinedecision. Therefore, train data analysis platform (TDAP) is proposed. The implementation of thisframework is primarily accustomed to Chinese train control system 3 that has cab integrated radioand global system for mobile communications railway. It consists of four parts: train conditionmonitoring unit, railway data integrator, edge operation system, and TDAP library. The tractionsystem is described in state-space representation, and a stable kernel representation is introducedto eliminate dynamic influence and to detect faults. The kernel representation is calculated usingedge computing. The measured data from the traction system is transferred to the TDAP, and itsuccessfully performs online fault detection. The concept of this computing theory is tested on fivefault cases with successful results. The cloud receives only the results, which considerably lowersthe communicational load.

Z. Liu et al. [70] discuss AI prognostics for high-speed railway systems and present a Cyber-Physical System framework that creates cyber twins. The main purpose of a cyber twin is torecreate physical application and its features by using AI and the industrial Internet of Things.Cyber twins monitor real-time performance and predict possible faults by using signal processingand ML on already existing data. For the performance to be in real-time, edge computing is used.Tasks that do not need a wide range of data are performed on the edge as well as the tasks thatrequire an immediate response.

In [72], SDN\NFV (Software Defined Networks/Networks Function Virtualization) framework isproposed. To increase the efficiency and reliability of railways systems, two use cases are considered.One of the use cases considers NFV-driven edge computing. Edge computing is mainly viewed asan efficient tool for data processing and raising alarms in risk situations.

14



5 Anomaly Detection in Control Systems - A Survey onExisting Approaches

There are a number of papers and articles concerning anomaly detection of various systems inorder to improve operation and preventive maintenance. Different papers describe different waysof data observation and abnormality detection in various applications. Several papers that providea detailed overview of anomaly detection are [73, 74, 75, 76, 77].

The paper [78] discusses the importance of anomaly detection in the field of the IoT. Authorsproposed the technique of improving anomaly detection and Root Causes Analysis using precedingknowledge (historical data) in order to decrease the number of false positives and less undetectedevents.

In paper [79], the data is observed in order to identify some patterns, using auto-encoders anddeep learning models. Four experiments are done to get a better insight into performance whethera distributed model using auto-encoders gives better results than using a non-distributed model.

The paper [75] proposes a new approach of detecting anomalies based on Graph ConvolutionalNetworks that can be used in different domains such as networks, medicine, industry, etc.

Alcaraz et al. [80] focused on anomaly detection techniques in the Smart Grid environment. Thepaper provides an analysis of different anomaly detection techniques and it gives recommendationson what is the most suitable algorithm for different setups.

Anomaly detection has been used also in the aircraft industry [80]. This paper aims to predictthe fault of aircraft elements. The classification-based anomaly detection algorithm is used on rawdata from sensors collected during the flight.

5.1 Machine Learning in Context of Anomaly Detection

There are plenty of concepts that have been used for the detection of anomalies in different domainsuch as ML, data mining, spectral theories, etc. The concept of ML is one of the most used conceptsfor the prediction of anomalous behavior [81]. Three techniques of ML are used in this purpose,supervised, semi-supervised, and unsupervised ML technique that will be described in followingtext.

5.1.1 Supervised Learning

Supervised learning is a technique of ML that requires labeled training data set [82]. The data setshould be labeled as normal behavior data set and anomalous (abnormal) behavior data set. Themain idea of supervised learning is to create a predictive model based on a given data set, thus thenew input data can be mapped to the output of the model. The supervised learning technique isstated as a technique with better performance of detecting than semi-supervised and unsupervisedlearning technique. However, there are several disadvantages to this technique [5]. The first is thelack of data that can cover all areas of normal and abnormal behavior of the considered process.The second is appearance false positives when the new data is enriched with noise. Also, theproblem could be how to distinguish and label the anomalous data.

5.1.2 Unsupervised Learning

Unsupervised learning technique does not need a labeled training data set unlike supervised learn-ing technique [5]. This technique relies on the theory of probability. The assumption is that thedata contains a large number of normal behavior data, while the number of anomalous data issmall. Unsupervised learning technique is very often used technique in different domains due tounneeded human intervention. The main problem with this technique is a high number of falsepositives [82].

15



5.1.3 Semi-supervised Learning

Semi-supervised learning technique is a hybrid model of supervised and unsupervised learningtechniques. It uses a small data-set of labeled data and a large amount of unlabeled data [5].The labeled data set is usually normal behavior data set. The problem that can arise using thistechnique is that any deviation from the normal data set can be considered as an anomaly.

5.2 Anomaly Detection in Industrial Control Systems

Anomaly detection provides promising results in various Industrial Control Systems (ICS), thusmany researchers tackled this topic. Many different approaches are used to detect anomalies inICS.

Kim et al.[76] presented Sequence-to-Sequence approach in ICS. The authors discuss in detailthe Sequence-to-Sequence approach. The data that has been used for training is only normalbehavior data. The algorithm is working on principle to detect outliers that rely on measuringerrors between sensors data and predicted behavior. The results show that the model missedseveral failures, as well as the model, obtained false positives.

Stefanidis et al. [83] focused on Hide Markov Model in Industrial Control Systems environment.The aim of the paper is to present a new approach for Network Intrusion Detection System basedon anomaly detection. The approach is tested on real data-set and it is an efficient model to detectnumerous attack vectors.

In order to detect abnormal behavior in ICS where the communication could be affected withnoise and changing normal behavior of the system, the paper [84] proposed Adaptive AnomalyDetection in Industrial Control Systems framework. The proposed framework consists of a greedyapproach and a neural network that is able to detect anomalies modeling normal behavior. Thispaper has a contribution in the cybersecurity domain of ICS.

Bigham et al. [85] bring us an analysis of the security of Supervisory Control And DataAcquisition (SCADA) systems using different anomaly detection algorithms. They gave resultsof algorithms performance based on experimental work for two different methods, N-Gram andInvariant Induction.

In [86], the importance of detecting anomalies in the field of Cyber Security for the IndustrialAutomation and Control Systems is explained. The paper provides two models of implementationfor tracking the traffic. Anomalies are defined and detected in sense of the expected, delayed orarrived traffic in inadequate time.

5.3 Anomaly Detection in Railway Systems

Many researchers tackled the problem of predicting failures in Railway Systems with a desire toimprove safety and reduce costs of maintenance.

Papers [87, 88] focus on the maintenance of the door system on the train. In [87], authorsinvestigated two different methods of ML (supervised and unsupervised) in order to highlight themethod for failure prediction with the smallest number of false positive, when the low-pass filteris used. The paper [88] proposes the Mean Shift algorithm to detect the incipient anomaly and itis considered for only one abnormal state of the system.

Another paper on this topic, anomaly detection in the railway industry, is [89]. The paperproposed a new strategy for the detection of abnormal behavior where the data, from differentsensors on the train, are used. The main focus of this novel strategy is to prevent failures of thetrain to make railway maintenance more efficient. The data that has been used for the experimentis mostly data from temperature, speed sensors, and accelerometers.

Butakova et al. [78] proposed a new approach for a network anomaly detection in DigitalRailway Communication Services. The paper aims to improve the safety of communications intransportation systems. The method is combined with two approaches, the fast spectral transformof traffic data and the decision-making process based on rough sets. The experimental results areobtained using real data.

16



The problem of bird’s nests on railway catenary is presented in [90] since the bird’s nests causeserious issues. The authors proposed the solution for detecting the bird’s nest on railway catenarybased on Deep Convolutional Generative Adversarial Networks (DCGANs). The DCGANs modeluses the image data to detect a bird’s nest that is considered as an anomaly.

[77] describes a detailed overview of anomaly detection and two unsupervised anomaly detectionmodels. The efficiency of the algorithms, Isolation Forest and Auto-encoder, is obtained using thereal data set from thermal, acoustic and impact sensors in the heavy haul railway line. This paperaims to optimize maintenance operations in the railway industry.

According to Ferreira et al. [91] anomaly detection is an effective technique to predict failure inrailway systems and to facilitate the maintenance operations. This paper evaluates and comparesthree different unsupervised anomaly detection learning techniques using real data from a railway.Algorithms K-means, Self-organizing Map and Auto-encoders are evaluated in this paper. Theyobtained that the Auto-encoder algorithm is the most efficient of the three presented algorithms.

Lyu et al. [92] proposed a novel approach of detecting anomalies for the isoelectric line inrailways. The method is based on image processing and DCGAN algorithm is used for training.

Kang et al. [93] focus on the online detection anomalies of the train speed signal. They proposedthe model scheme that can detect unexpected rapid changes in train speed, as well as the modelthat detects smart attacks with gradually speeding changes using linear regression.

Li et al. [94] proposed a new method for anomaly detection Scores Sequence that has beenused for railway systems data. Scores Sequence is a hybrid approach, it combines supervised andunsupervised ML algorithms. The main idea of this algorithm is to consider a group of consecutivepoints instead of only one point.

5.4 Anomaly Detection in Networks

Anomaly detection for monitoring network traffic is on the rise [95]. The paper [73] emphasizes thevulnerability of new IoT systems connected to the public network, and the importance of anomalydetection in the self-defense of the IoT systems.

Wireless Sensor Networks (WSN) are important elements of IoT systems. The paper [96]proposes an optimized method for detecting anomalies in WSN using already existing differenttypes of detectors.

Korba et al. [97] describe a novel approach for detecting anomalies in ad hoc networks. Anoma-lies are malicious nodes in the ad hoc networks, and a new approach prevents system intrusions.

The anomaly detection method related to SDN domain is presented in [98]. The method ismostly based on prior methods of anomaly detection in SDN, and it promises both accuracy andprivacy.

Other papers that are also concerning anomaly detection in the domain of networks are papers[76, 83].

17



6 Overview

This thesis is conducted at Bombardier Transportation (BT) and is addressing a way to collectdata from their propulsion system, data processing, and analysis of potential anomalies that couldimprove propulsion system maintenance. Based on this, we can claim that the thesis is contributingto both industry, and academia providing solutions and knowledge applicable to both. For theindustry part, we have identified and implemented a way to collect relevant data samples anddetect anomalies over that data. From the perspective of academia, we have contributed to theincreasing the knowledge related to the edge computing in the domain of railway systems byconducting a survey and analyzing the collected data, as well as with experimental results fromthe implemented approach.

The thesis work is done as a part of an international project called RELIANCE in which BTtakes part [8]. The RELIANCE stands for Resilient and scalable slicing over multiple domains.The project aims at providing a complete framework for data collection and analysis done incollaboration between several companies from three different countries (i.e., Spain, Sweden, andTurkey). The main focus of companies from Sweden is in providing train data collection over4G/5G networks and performing data analysis in the cloud. Figure 9 shows the contributionsof the Swedish consortium within the RELIANCE project. As we can see, BT aims at enablingcollecting the data from the running train and preparing the data to be transmitted to the cloudin a suitable format.

Figure 9: Global overview of RELIANCE project [8]

To be more precise, our work is done within the Propulsion Control (PPC) department. There-fore, one of the first steps within this work has been to get familiar with the PPC System, whichis a sub-system of a Train managed by a Train Control and Management System (TCMS). Thepropulsion system that we have focused on has been MITRAC EOS. To communicate with theMITRAC EOS propulsion system, we have used Multiple Session Low-latency (MSLL) protocol,which is a product owned by BT. The MSLL protocol will be explained in Section 6.1.

18



6.1 MSLL protocol

MSLL protocol is a logging feature that utilizes UDP/IP transmission, hence it can only be usedfor systems that are connected to the network with IP addresses. User Datagram Protocol (UDP)is a protocol on the transport layer, that is connectionless [99]. This protocol has no handshakingat the beginning of the communication process, and it has no guarantee that message will bedelivered. That implies MSLL protocol has no guarantee that message will be delivered on thereception side. MSLL protocol works with small UDP/IP messages, and the message is directlysent to the recipient with no transmission control. The characteristics of MSLL protocol are thefollowing:

• real-time log from several systems at the same time,

• multiple users can log at the same time from one system,

• low latency since samples are sent directly from their sample levels,

• can co-exist with existing logging systems,

• communication and signal setup are not handled by the monitor function.

19



7 Our Approach

The main goal of this thesis was to develop an approach for detecting propulsion control systemanomalies. The first step to tackle this challenge was to collect the data from the propulsion controlsystem (i.e., sensors within the system). After collecting the data, the second step was to analyzethe data and detect anomalies. Since the trains produce a large amount of data, we utilized ECadvantages. We analyzed the data in close proximity to the data sources to optimize the networkbandwidth and memory resources.

Since we had to collect the data from trains during operation, we have created the client thatcommunicates with the propulsion system. The client has to be ported on a computer that iscertified to be used on the train during the operation. Computers that are used on BT trains arerunning on Linux. Therefore, the client should be running on the Linux platform as well. Thereforethe client is called Linux Client. The Linux Client supports the existing MSLL protocol, and itcan also be referred as MSLL Client. The Linux Client uses the MSLL protocol to communicatewith MSLL server that is placed on MITRAC EOS computer. The physical connection of theLinux Client and MSLL server is shown in Figure 10.

Figure 10: The physical connection of the server and client

To implement the Linux Client, we have used Python programming language. By using Python,the Linux Client becomes independent from the operating system, which is an advantage for futurework and propulsion control system updates. To start collecting the data, the Linux Client shouldreceive the configuration file as an input. The configuration file consists of the IP-address of thecomputer that is running the client, requested sample time for signals, and a list of signals thatwe want to collect. The configuration file was predefined by the domain experts in the form of atext file.

The first thing that the client does is a handshake with MSLL server to check if it is possible toestablish communication. When the handshake is established, the client tries to open the session.After the session is opened, it is possible for the client to request signals that are set in theconfiguration file. Before the server starts responding with signal data, it sends a message withtypes of required signals. In this phase, the client is receiving samples of signals. The client mustperiodically send a keep-alive message to the server since the MSLL protocol communication isbased on the UDP protocol. During all this time, the client checks if the message from the server isan error message to perform error handling. The client can terminate the session established withthe server anytime by sending the termination message. This procedure is shown in the activitydiagram in Figure 11.

The signal data is recorded in a text file that has a predefined format by the domain experts,and in a CSV file. To enable signal visualization in one of BT’s programs, the text file has tosatisfy a specific form. Each text file has a unique name based on IP address and date and timeof recording. It consists of signal names and signal values. The text file format is shown in Figure12 that represents an example of a text file.

The CSV file consists of signal names in the first row, followed by signal data values in others.An example of CSV file is shown in Figure 13.

20



Figure 11: Activity diagram for Linux Client - Server communication

21



Figure 12: An example of a text file

Figure 13: An example of an excel file

Now, when we have collected the signal data, we were able to continue with the data analysis.Usually, the data is sent to the cloud, and all computations are done in the cloud. The mainchallenge that we have is a large amount of data that takes a lot of memory resources and it isdifficult to transmit. Therefore, we have decided to analyze the data on the computer that runsLinux Client. This computer represents the edge of the network and acts as an edge node.

After the desired signal analysis is conducted, the results are ready for transmission to the cloud.In our case, we did not send the results to the cloud, as the work is part of a large internationalproject, and the infrastructure is not ready yet. At this point, the main focus has been on preparingthe results for transmission to the cloud. Our contribution to the RELIANCE project is shown inFigure 14 with green color. Our approach for data analysis and anomaly detection is described inSection 7.1.

Figure 14: Position of our thesis in RELIANCE project

7.1 Anomaly Detection

The data has been collected in the RTS lab (Real-Time Simulator), which has been configuredfor Green Cargo Rd locomotive that is shown in Figure 15. The Rd locomotive is a modernized

22



Rc2 locomotive. There are two traction motors placed in one bogie and there are two bogies inone Rd locomotive, the line voltage is 15kV , the frequency is 16 23Hz, with continuous transformerpower of 3910 kVA. The simulator in the RTS lab has four motor controllers that represent tractionmotors, and it is called a rack.

Figure 15: Rd locomotive [9]

The signals that we have obtained are coming from one rack. Each motor controller in the rackhas sensors for measuring the voltage, current, and two different kinds of temperature. The speedand the power signals are related to one rack, which means they are connected with all motorsin the rack. Therefore, we had to consider features of the locomotive that is driving that train,existing signals, and their features to perform quality analysis of the data. After data collection,we carried out a thorough discussion with the domain experts about the signal features. Throughthese discussions, we were able to understand what kind of anomalies exist and what is expectedsignal behavior.

The experts expressed the need for the current signal to be the main focus for our anomalydetection since it is one of the most important signals in the monitoring process. The normalcurrent value goes up to 1900 A, and everything above this value is considered as an overcurrent.The overcurrent is one of the most dangerous phenomena in any system and directs to seriousproblems, potential anomalies in the propulsion system. By doing signal analysis, we realized thatthe overcurrent is classified as an outlier anomaly. Also, the system might go into a dangerous statewhen high values of the current signal appear together with low values of speed and power signals.This is also considered as an anomaly, namely contextual anomaly. More about the anomalyclassification, we refer the reader to Section 2.4.

Since the current signal depends on a few parameters that are represented by other signals, theanomalies of the current signal may indicate several system failures, such as short circuit, motor,sensor, or cable failure. Also, in discussion with experts, we came to the conclusion that bothmotors from the same bogie have similar behavior. The current signals from motors have similarchanges in amplitude in time. The values from current signals do not have to be the same, but thedeviation of two current signal values has to be within the specified limits. The experts proposedto limit this deviation to 300A.

To detect and analyze anomalies, we have decided to use MATLAB since it is an efficient toolfor various methods of Digital Signal Processing and Data Analysis. Using MATLAB required thetransfer of the collected data with the Linux Client to MATLAB workspace. The manual processis slow and requires human interference. Therefore, we have automated this process to make itfaster and more efficient. This is done by recording the data in an excel file that can easily beloaded into the MATLAB script. Also, we have run MATLAB script from Linux Client that usesa prior trained model to detect anomalies in new data. Figure 16 depicts the phases in anomaly

23



detection process.

Figure 16: Phases in Anomaly Detection

To train a model that detects anomalies, we have used two ML methods, i.e., unsupervised andsupervised learning. The main reason why we have used two different methods is that the datawe have collected in the RTS lab was not labeled. While we conducted an unsupervised learningmethod, the domain experts analyzed the data and appropriately labeled it. Afterward, we wereable to conduct a supervised learning method. We have applied already existing ML algorithmsin MATLAB and adapted them to our needs. These methods are explained in Section 7.1.1 andSection 7.1.2.

7.1.1 Unsupervised Learning

For the unsupervised learning method, we have used Principal Component Analysis (PCA) andbinary classification Decision Tree algorithm to create a predictive model. As explained in Section5.1, an unsupervised learning technique does not require labeled data-set, therefore the algorithmis applied on unlabeled data-sets. The Decision Tree algorithm belongs to the group of supervisedML algorithms, but it can also be used as an unsupervised learning technique. The reason forusing the PCA was to reduce the dimensionality of data-sets since we considered signals from sixsensors. The PCA method gives us eigenvectors for the principal components that could representa large set of variables into the smaller set.

Since we considered signals from six sensors, the PCA algorithm gives us six principal compo-nents. The percentage of representation of the data by each principal component and the combinedprincipal components is shown in Figure 17a. The orange dots show the contribution of the indi-vidual principal component in the data representation, while blue dots represent the contributionof the cumulative combination of principal components. As we can see in Figure 17a, the wholedata-set can be represented by first and second principal components with the credibility of 99.7%.Hence, most of the information lives in the two-dimensional subset, and by plotting the first twoprincipal components together, we can represent almost all information from six sensors. Theplot of the first two principal components is shown in Figure 17b, while Figure 17c shows us thedensity of the data distribution where the parts with lighter color have a smaller number of datapoints, and parts with darker color have a larger number of data points. However, the data that isshown in Figure 17b is interpreted as normal behavior for points in the green rectangle, abnormalbehavior for points in the red rectangle, and warning for points in the orange rectangle.

In this regard, the model for anomaly prediction has been trained based on one data-set thatwe have got in the RTS lab by simulating one trip. The trained model is tested on several differentdata-sets for the same setups in the RTS lab, which means for the same train, but different trips.Figure 18 and Figure 22 represent the results of the trained model.

The real and predicted values of the current, for the first test data-set, are shown in Figure18a, while the error between them is shown in Figure 18b. As we can see, the predicted currentdeviates from the real one within certain limits, and the largest deviation from the real current is250.8A at sample time 15.897h, see Figure 18c. This deviation is considered within allowed limits,since we have decided, in agreement with domain experts, to set the alarm state for the deviation

24



(a)

(b) (c)

Figure 17: (a) Evaluation of each principal component (b) The plot of the first and second com-ponents of PCA for one data-set (c) Density of the data distribution explained by first and secondprincipal components

higher than 300A. Therefore, this data-set is clustered as the normal behavior of the propulsionsystem.

The real and predicted values of the current, for the second data-set test, are shown in Figure19a, while the error between them is shown in Figure 19b. This data-set is clustered as anoma-lous, since the maximum deviation between real and predicted values of current is greater than300A. This anomaly is considered as a point anomaly because the current reaches values higherthan 1900A. If we analyze the error between the real and predicted current without consideringanomalous points, in Figure 19b, we can see that the error is within allowed limits.

Figure 20a presents the real and predicted value of current for third data-set, and Figure 20bpresents corresponding error signal. This data-set is also clustered as anomalous data-set. As wecan see, the data-set has a point anomaly since values of current are greater than 1900A. However,this data-set would be clustered as an anomalous data-set without considering these high pointsof current that are anomalous. The error between the real and predicted current, see Figure 20b,before and after anomalous points, reaches values greater than 300A. However, the trained model,in this case, also gives false-positive results.

The fourth data-set is clustered as a normal data-set since the error between the real andpredicted current, see Figure 21b, is within the allowed limits. The plot of the real and predictedcurrent is shown in Figure 21a.

The next data-set is clustered as anomalous data-set, see Figure 22. This data-set consists acontextual anomaly, since the value of current reaches values higher than 1000A at low values ofpower and speed. The real and predicted current is shown in Figure 22a, while corresponding erroris shown in Figure 22b.

25



(a)

(b) (c)

Figure 18: (a) Real and predicted signal of current (b) Error between real values and predictedvalues of current (c) Maximum error between real and predicted values of current

(a) (b)

Figure 19: (a) Real and predicted signal of current (b) Error between real values and predictedvalues of current

The real, and predicted current, that are shown in Figures 18-22, are filtered values of currentsince the real current consists of direct and alternating current (i.e., DC and AC). The AC rippleappears as a product of doing transformations of signals on trains, and it could be considered as anoise signal. The amplitude spectrum of the real current is shown in Figure 23, where we can see

26



(a) (b)


(a) (b)


frequencies that the current signal consists of. Since the DC part of the current is dominant, wehave filtered signals using a low-pass filter with a cut-off frequency of 0.5Hz.

To conclude, we have noticed that the model predicts better the values when the data-setconsists of lower values of current. Analyzing the training data-set, we have seen that it has agreater number of lower values than higher values of current. Since we did not have the data-setthat consists of both, lower and higher values, in approximately equal numbers, we decided to usethis data-set since it gave us the best results. Also, the model can detect two types of anomalies,contextual and point anomalies with a resolution of 300A. The disadvantage of this model is theinability to detect point anomalies when values of real current are high and anomalous values ofcurrent are low because the alarm rises on the error difference of 300A. This difference does notaffect contextual anomalies. Therefore, this model is acceptable with assumptions that anomaliesdo not appear frequently, and values of current are mostly in a range of 0− 1600A.

7.1.2 Supervised Learning

To conduct this method, we have used five time-series signals related to the current signal aswell as the current signal. As explained in Section 5.1, a supervised learning technique requiresdata-sets with labeled normal and abnormal behaviors. Supervised learning enables the labeling of

27



(a) (b)


Figure 23: Amplitude spectrum of current

multiple different faults. This is beneficial in later work for understanding the cause of the detectedanomaly. The domain experts labeled previously explained anomalies of the current signal withone faulty behavior label. So, the data from all the signals in one simulated train trip is labeledas normal or abnormal behavior. In our data-set, we have had 24 behaviors labeled as normaland 8 behaviors labeled as abnormal, where 6 behaviors are labeled as point anomalies and 2 ascontextual anomalies.

To apply the ML algorithm, we had to reduce the dimensionality of time series data. Wedecided to use the most important features of the signals to represent them, so we applied thefeature extraction of signals. Some of the most important features that we have used to representsignals are peak value, crest factor, impulse factor, kurtosis, and clearance factor. We have usedthese features for training several ML algorithms in MATLAB toolbox Classification Learner. Toreduce the overfitting of the model, we have used cross-validation, which partitions the data-setinto five-folds and estimates the accuracy of each fold.

The models that have shown the best accuracy rate of 87.5% are Tree and Ensemble SubspaceDiscriminant. There are three Tree models in Classification Learner, Fine, Medium, and Coarse,and they all had the same results. After we have trained the models, we have used a scatterplot to visualize the relation between features. The scatter plot of the trained model shows thecorrect model predictions with a dot, while incorrect values are shown with a sign x. The data

28



with normal behavior is represented with orange color, data with point anomalies are representedwith blue color and data with contextual anomalies with yellow color. Since the scatter plots ofTree and Ensemble models are the same, we only presented scatter plot of Tree model in Figure24a. We can conclude that normal data is classified correctly, but the model provided us onlywith four good classifications of data with anomalies. The incorrect classifications of the modelare false-negative. The orange x shows where the model has classified the data as normal when ithad an anomaly.

(a) (b)

Figure 24: (a) Scatter plot of Tree model predictions (b) Confusion matrix of Tree model

The confusion matrix is another way to visualize the prediction model results. Rows in theconfusion matrix represent true class, and they are named by class labels. Columns representpredicted classes, and they are also named by class labels. The confusion matrix of Tree modelsis presented in Figure 24b. Confusion matrix of Ensemble model is the same as matrix of Treemodel. Since we have labeled the data with point anomalies as 0, normal data as 1, and datawith contextual anomalies as 10, we can see that the columns and rows are labeled as 0, 1 and10. We can conclude that the numbers of correctly predicted classes are placed diagonally andthat the other fields contain the number of incorrectly predicted classes. Also, Figure 24b leadsto the conclusion that only four classes of abnormal data are classified correctly. Furthermore, allcontextual data is incorrectly classified, since the model uses one class for training and the otherone for testing. Since there was not enough data with contextual anomalies and one class is notenough to train the model, we have decided to exclude the data with contextual anomalies in thenext step and repeat the process.

Now, we have trained several ML algorithms on data with only point anomalies. The classifiersthat showed the best results are Naive Bayes Kernel and Ensemble Subspace KNN with an accuracyof 96.7% with only one bad prediction. The Naive Bayes classifier is based on Bayes’ theorem withthe assumption that the features are independent [100] and it uses kernel distribution for numericpredictors. The Ensemble classifier uses multiple ML algorithms to provide better prediction, andthe one that had the best results uses the nearest neighbor classifier with a random subspacealgorithm. The classifiers that had 93.3% accuracy and two bad predictions are Naive BayesGaussian, Linear SVM, and Ensemble Subspace Discriminant. Other trained classifiers with loweraccuracy will not be discussed. The incorrect results of all these models are considered as false-negative.

Figure 25a shows scatter plot predictions of the Naive Bayes model. As we have already stated,the Naive Bayes model has only one bad prediction, which can be also seen in the plot withorange x. This means that the true class is abnormal behavior, and the model classified it asnormal. Figure 25b shows scatter plot predictions of the Ensemble Subspace KNN model. Byobserving this scatter plot, we can conclude that the model made a mistake with one prediction.The Ensemble model also predicted normal behavior for data that had an anomaly. By observingFigure 25a and Figure 25b, we can obtain that these models produced a mistake for different data.

The confusion matrix for Naive Bayes and Ensemble models is the same, so Figure 25c shows

29



(a) (b)

(c)

Figure 25: (a) Scatter plot of Naive Bayes predictions (b) Scatter plot of Ensemble predictions(c)Confusion matrix of Naive Bayes model

a confusion matrix of the Naive Bayes model. As we have already stated, these models made onlyone incorrect prediction, and in Figure 25c we can see that one class is incorrectly predicted.

To conclude, the supervised learning method achieves very good results in detecting pointanomalies. However, the model cannot detect contextual anomalies since there has not beenenough data to train it properly. By extending the data-set with data that contains contextualanomalies, we could train the model again and possibly detect contextual anomalies as well.

30



8 Related Work

In this section, we provide a comparison between our anomaly detection approach with existingapproaches in the railway domain, and we focus on considered sensors, functionalities, and methods.Detecting anomalies in the railway systems is a broad area and various problems can be considered.

Some papers tackle the problem of detecting anomalies in the door system [87, 88], whileButakova et al. [78] proposed a novel approach for anomaly detection related to the communicationsystem. Some papers tackle a problem with isoelectric line [92], as well as bird’s nest on rai

Documents

DATA DRIVEN ANOMALY CONTROL DETECTION FOR RAILWAY … · 2020. 11. 17. · Ajna Hod zi c D zenita Skulj Data Driven Anomaly Control Detection for Railway Propulsion Control Systems