86
DEVELOPING A FRAMEWORK TO MITIGATE THE GROWING INCIDENTS OF CYBER-SECURITY THREATS ON PROCESS CONTROL NETWORKS (PCN): A CASE STUDY OF THE PETROCHEMICAL INDUSTRY A Dissertation Presented to The Engineering Institute of Technology by Abimbola Ogunlade In Partial Fulfillment of the Requirements for the Degree Master of Engineering in INDUSTRIAL AUTOMATION JUNE 2017 COPYRIGHT © 2017 BY ABIMBOLA OGUNLADE i

DEVELOPING A FRAMEWORK TO MITIGATE THE GROWING … · Random Tree, Multilayer Perceptron and Naïve Bayes algorithms. The analysis was able to discover and identify the key feature

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

DEVELOPING A FRAMEWORK TO MITIGATE THE GROWING INCIDENTS OF CYBER-SECURITY THREATS ON PROCESS CONTROL NETWORKS (PCN): A CASE STUDY OF

THE PETROCHEMICAL INDUSTRY

A Dissertation Presented to

The Engineering Institute of Technology

by

Abimbola Ogunlade

In Partial Fulfillment of the Requirements for the Degree

Master of Engineering in INDUSTRIAL AUTOMATION

JUNE 2017

COPYRIGHT © 2017 BY ABIMBOLA OGUNLADE

i

DEDICATION

This thesis is dedicated to my loving wife, parents and my three sons: Tobi,

Ayo and Kunle Ogunlade.

ii

ACKNOWLEDGMENT

First and foremost, I would like to thank God Almighty, the Creator of

Heavens and Earth, for without Him, none of this would be possible.

I would like to express my profound gratitude to my supervisor Dr. Hadi for

giving me the opportunity to work under his professional supervision. Your

motivation and guidance were very admirable and inspiring. I will sincerely be

grateful for the effort and time you took in providing valuable advice and comments

on the entire thesis, particularly the subject of machine learning which is an area of

interest I may consider for my PhD research.

Finally, I must express my very profound appreciation for my loving wife Dr

Prudence Ogunlade for providing me with unfailing support and continuous

encouragement throughout my years of study. This achievement would not have been

possible without her emotional support. Thank you.

iii

SUMMARY

Cyber security threats are emerging at a more rapid rate than ever. Cyber

exploits and incidents relating to the Process Control Networks (PCN) are becoming

prominent and more sophisticated with the way and manner they are orchestrated.

Many petrochemical organizations simply cannot keep up with the pace of cyber

threats as attack defenses appear obsolete almost as soon as they are implemented.

A framework that will mitigate the dynamics of increasing incidents of cyber

security exploits in the petrochemical industry that is proposed. The framework

involves a hybrid approach of using machine learning predictors together with

traditional risk management strategy where the machine learningbased intrusion

detectors play a central role. A comprehensive review of several machine learning

algorithms were discussed and conceptualized. Using the WEKA machine learning

software to analyze captured data, three algorithm classifiers were applied, namely

Random Tree, Multilayer Perceptron and Naïve Bayes algorithms.

The analysis was able to discover and identify the key feature of reference

used in the analysis. Interestingly, “protocol and length” data packet features were

used as reference indicator, where the Random tree model achieved a classification or

detection accuracy of about 98% when analyzed using a 10-folds cross-validation and

percentage spilt analysis, respectively. Meanwhile, a Multilayer Perceptron and Naïve

Bayes obtained classification accuracies of 97% and 94%, respectively, when the

analyses were conducted on the same set of captured data.

iv

Evidently, the use of machine learning has demonstrated the enormous

opportunities of combining the methodology of active learning – machine learning in

collaboration with a PCN administrator in the detection and analysis of data traffic

within the PCN environment.

v

TABLE OF CONTENTS

Dedication ..................................................................................................................... ii

Acknowledgment ......................................................................................................... iii

List of Abbreviations ................................................................................................. viii

List of Figures ............................................................................................................... ix

List of Tables ................................................................................................................. x

Chapter 1. Research Introduction .................................................................................. 1

1.1 Introduction ....................................................................................................... 1

1.2 Problem Statement and Substantiation ............................................................. 2

1.3 Research Aims and Objectives ......................................................................... 4

1.4 Expected Outcomes and Deliverables .............................................................. 4

1.5 Research Methodology ..................................................................................... 5

1.6 Provisional Table of Contents ........................................................................... 6

Chapter 2. Cyber Security: A growing threat in Oil and Gas Industry .......................... 8

2.1 Introduction ....................................................................................................... 8

2.2 Understanding Cyber risks in the Oil and Gas.................................................. 8

2.3 Cyber incidents targeting Energy sector ......................................................... 10

2.3.1 Survey of Cyber Incidents on PCN Infrastructure ................................... 11

2.3.1.1 Incident 1: GAZPROM pipeline Incident ..................................... 12

2.3.1.2 Incident 2: Cyber-Attack on Saudi Aramco .............................. 12

2.3.1.3 Incident 3: The Stuxnet incident at Iranian Nuclear Plant ......... 13

2.3.1.4 Incident 4: Davis-Besse Nuclear power plant incident ............. 14

2.4 Growing trends of Cyber-attack...................................................................... 15

2.4.1 Cyber-Attack Methodology ..................................................................... 18

2.4.1.1 Advanced Persistent Threats ..................................................... 23

2.4.2 Methods of Operation .............................................................................. 24

vi

2.5 Cyber Security Risks Frameworks.................................................................. 25

Chapter 3. Machine Learning: A solution to mitigate cyber threats ............................ 30

3.1 Introduction to ML .......................................................................................... 30

3.2 Basic ML Methods .......................................................................................... 31

3.3 ML in Cyber security ...................................................................................... 33

3.4 Understanding ML Models ............................................................................. 36

3.4.1 Overview of Decision Tree ...................................................................... 38

3.4.2 Overview of Neural Networks ................................................................. 39

3.4.3 Overview of Naïve Bayes ........................................................................ 40

Chapter 4. Empirical Investigation .............................................................................. 42

4.1 Introduction ..................................................................................................... 42

4.2 Creating the data sets ...................................................................................... 43

4.3 Feature extraction of data (CSV format) ........................................................ 45

4.4 Combine and convert data (ARFF format) ..................................................... 46

4.5 Interpretation of results and findings .............................................................. 47

4.5.1 Verification and Validation...................................................................... 49

4.6 Using active learning method of machine learning for Cyber security intrusion detection for PCN application ...................................................... 50

Chapter 5. Conclusion .................................................................................................. 54

5.1 Conclusion and Recommendations ................................................................. 54

Appendix A. Naïve Bayes (Cross Validation) ........................................................ 56

Appendix B. Naïve Bayes (Percentage Split) ......................................................... 58

Appendix C. Multilayer Perceptron (Cross Validation) ......................................... 60

Appendix D. Multilayer Perceptron (Percentage Split) .......................................... 64

Appendix E. Random Tree (Cross Validation)....................................................... 68

Appendix F. Random Tree (Percentage Split) ....................................................... 71

References .................................................................................................................... 74

vii

LIST OF ABBREVIATIONS

PCN Process Control Networks

SCADA Supervisory Control and Data Acquisition

WEKA Waikato Environment for Knowledge Analysis

ICT Information and Communication Technologies

IoT Internet of Things

MBR Master Boot Records

PPC Plant Process Computer

SPDS Safety Parameter Display System

APT Advanced Persistent Threat

ICS Industrial Control Systems

NIST National Institute of Standards and Technology

NISCC National Infrastructure Security Coordination Center

ML Machine Learning

AI Artificial Intelligence

UEBA User and Entity Behavioral Analytics

AV Antivirus

DLP Data Loss Prevention

API Application Programming Interface

LAN Local Area Network

viii

LIST OF FIGURES

Figure 2.1 Cyber Incidents by Sector: Fiscal Year 2012 ............................................. 16

Figure 2.2 Intruder knowledge versus attack sophistication ........................................ 17

Figure 2.3 Attack tree for a MODBUS-based SCADA system ................................... 19

Figure 2.4 An attack step ............................................................................................. 20

Figure 2.5 Fiscal Year 2014 incidents reported by access vector (245 total) (The Industrial Control Systems Cyber Emergency Response Team (ICS-CERT), 2014) .............................................................................................. 21

Figure 2.6 The ISO 31000:2009 risk management process ......................................... 27

Figure 2.7 Generic SCADA hardware architecture ..................................................... 28

Figure 3.1 The four layers of Data mining and Machine learning ............................... 37

Figure 4.1 Based on the generated data ML models are created that can classify packets. ................................................................................................... 43

Figure 4.2 Wireshark screenshot during packet capture .............................................. 44

Figure 4.3 Combine and convert data (ARFF format) ................................................. 47

Figure 4.4 Active Learning Process ............................................................................. 52

Figure 4.5 Flowchart for Active Learning classification ............................................. 53

ix

LIST OF TABLES

Table 4.1 Class Categorization .................................................................................... 45

Table 4.2 WEKA output results – Classifiers comparison analysis ............................ 48

Table 4.3 WEKA output results – Confusion matrix analysis ..................................... 50

x

CHAPTER 1. RESEARCH INTRODUCTION

Chapter 1 introduces the research study. The research context is described to

provide background for the research. The goals of the research, and the research

methodology adopted are presented. Finally, the thesis organization is outlined.

1.1 Introduction

Petrochemical industries are not new to uncertainty and risk. Their increasing

dependence on technology and web-based communication has opened the door for

cyber security threat, particularly in the oil and gas industry. These are significant

threats, such as hydrocarbon installation terrorism, which can cause plant shutdowns

resulting from sabotage and interruption of utilities. With the oil and gas sector

driving every aspect of our daily life, the protection of this critical infrastructure has

never been more critical. However, the consequences of attacks on the operations and

systems that power such lifestyle cannot be underestimated.

In recent years, industrial cyber security threats have grown from the esoteric

practice of a few specialists to a problem of general concern. All stakeholders now

have a new responsibility in promoting the safety, reliability, and stability of critical

industrial infrastructure. With the rising threat of malware in today’s open computing

platforms, the typical Process Control Networks (PCN) is increasingly vulnerable to

outside modification and exploitations. Cyber-attacks on plant-automation systems

have not only increased, but have also grown more sophisticated in recent years. From

targeted information gathering and theft to elimination of crucial data, these intrusions

represent a real and present danger to plant productivity, reliability, and safety.

1

Companies in the oil and gas, refining, petrochemical, and power-generation

industries, among others, must avert and mitigate cyber security threats that expose

their production operations, including risks to plant infrastructure, equipment,

personnel, and the environment. This includes taking certain proactive steps to protect

critical facilities from cyber intrusion. Taking those steps requires an understanding of

current and future cyber security risks, past incidents in process sectors, and

knowledge of ever-changing security challenges. Because attacks on PCN are now

more frequent with increasing sophistication, defensive strategies must evolve to keep

up. This research will explores effective methods with a view of developing a

comprehensive framework that will mitigate the dynamics of increasing incidents of

cyber security exploits in the petrochemical industry.

This chapter introduces the research investigation. To set the background for

the research study, the research context is explored. The goals of the research and

research methodology adopted are then presented. Finally, the thesis organization is

outlined.

1.2 Problem Statement and Substantiation

The number of cyber incidents targeting energy and petrochemical

infrastructures has significantly increased over the last few years. Technically, the

cyberspace and its underlying infrastructure are vulnerable to a wide range of risk

emanating from both physical and cyber threats and hazards. Between December

2011 and June 2012, 23 gas pipeline companies were targeted by cyber spies where

confidential data were compromised and stolen. It should be noted that the stolen data

is crucial and sensitive, which could be used to sabotage gas pipelines. Per US

Industrial Control Systems Cyber Emergency Response Team (ICS-CERT) 2013

2

data, 56% of 257 recorded cyber incidents targeted energy infrastructures, whereas

this number was 40% in 2012 [1]. Sophisticated cyber actors and nation-states exploit

vulnerabilities to steal information and money, armed with capabilities to disrupt,

destroy, or threaten the supply of essential services. Moreover, a growing concern is

the cyber threat to critical petrochemical infrastructure, which is increasingly subject

to sophisticated cyber intrusions that pose new risks. In 2014, energy and

petrochemical sectors were indicated as a main target for sophisticated threat actors

for a variety of reasons [2]. It has been estimated that about 80% of oil and gas

companies saw an increase in the number of successful cyber-attacks in 2015 alone.

Allied Business Intelligence (ABI) Research calculates that cyber security spending

on the oil and gas critical infrastructure will reach $1.87 billion by 2018 [3].

As information technology becomes increasingly incorporated with physical

infrastructure operations, there is increased risk for wide-scale or high-consequence

events that could cause harm or disrupt services upon which financial economy and

the daily lives of millions of people depend. The PCN, which is a communications

network that is used to transmit instructions and data between control and

measurement units and Supervisory Control and Data Acquisition (SCADA)

equipment, is highly vulnerable to these risks. According to [4], the cyberspace is

particularly difficult to secure due to a number of factors: the ability of malicious

actors to operate from anywhere in the world, the relationships between cyberspace

and physical systems and the difficulty of reducing vulnerabilities and consequences

in complex cyber networks.

1.3 Research Aims and Objectives

3

The aim of this research is to analyze and assess the growing trend of cyber

security threats on the process control networks (PCNs) with a view of formulating a

framework that can mitigate these threats. This research will be carried out with the

objective of:

a. Assessing the rising incidents of cyber security exploits on PCNs in the

petrochemical industry.

b. Developing and formulating a framework that will mitigate these rising

occurrences using a hybrid approach.

Therefore, the above objectives present an opportunity to answer the following

research questions:

• What are the various cyber exploits possible in a PCN environment?

• How can these vulnerabilities be detected and prevented?

• Can the detection and prevention methods capable of adapting to the growing

sophistication of cyber intrusions?

1.4 Expected Outcomes and Deliverables

The number of attacks on industrial networks, particularly petrochemical

plants, is growing rapidly [5]. Attackers are gaining new skills that allow them to

bypass defenses that would probably have been effective just a few years ago.

Defensive strategies have also evolved, helping users keep their plants safe and their

critical information under control. However, these palliatives alone are not enough to

keep cyber intrusion at bay. Therefore, a comprehensive framework is required to be

developed as a matter of priority. This framework will include strategies that are

capable of mitigating ever rising cyber incidents in the petrochemical industries.

4

This research will propose effective methods with a view of developing a

comprehensive framework that is capable of potentially mitigating the dynamics of

increasing incidents of cyber security exploits in the petrochemical industry. The

framework would adopt a hybrid approach of using machine learning predictors

together with traditional risk management strategy where the machine learning-based

intrusion detectors play a central role.

1.5 Research Methodology

Analysis of literature and sources of information

This research work consulted the following sources for information, which

were used to carry out a comprehensive literature survey:

• Library sources (related books, etc.)

• Journal articles and publications.

• Newspapers, magazines, and reports (Oil and Gas or Petrochemical

proceedings, conferences, etc.)

• Thesis and dissertations (Reviewed related work and findings in this field of

study).

• Internet sources (search / metasearch engines: Lycos, Alta Vista, copernic,

metacrawler and financial databases: Macgregor’s, etc.).

• Project reports (benchmarked the reports of similar projects implemented on

cyber security and machine learning).

• Waikato Environment for Knowledge Analysis (WEKA): machine learning

Group website at the University of Waikato [6].

5

• A collection of machine learning algorithms for data mining tasks using

WEKA as the data mining software was adopted. However, due to legal and

confidentiality implications, it was impracticable to conduct this research in an

operational petrochemical facility. Therefore, Wireshark packet analyzer

software was used to provide a simulation of actual PCN to model the

different architecture layers of a PCN. Different exploit scenarios were

conducted to verify and validate the context of this research outcome.

• Write outcomes and findings.

1.6 Provisional Table of Contents

The thesis is organized into the following chapters:

Summary: This section briefly describes what the entire research work is all

about. It presents a short overview of the entire research work.

Preface and acknowledgment: This section has been used for personal

comments about the conditions in which my research was conducted and about

persons, institutions and organizations that provided assistance.

Chapter 1 – Introduction: This section presents a comprehensive

background for my research work.

Chapter 2, 3 – Literature Review: This section contains a detailed review of

the information gathered from different sources during my research. It extensively

covers a detailed analysis of literature, journals, articles, textbooks and other related

or similar works.

6

Chapter 4 – Empirical Investigation/Interpretation of Results and

Findings: This section introduces the research concepts and methodology. The

research design and approaches used in the research have also been addressed. The

results from my empirical investigation are also stated and explained here.

Chapter 5 – Conclusion: Conclusions and recommendations are made based

on the results achieved from the entire research work.

References: This section lists all the relevant information sources used

throughout the research work.

Glossary and definitions of terms

Appendices

7

CHAPTER 2. CYBER SECURITY: A GROWING THREAT IN

OIL AND GAS INDUSTRY

This chapter explores the growing trend of Cyber security threats in the Oil

and Gas Industry and investigates the level of sophistication. Various impacts of these

incidents are also reported.

2.1 Introduction

While the energy sector is diverse including renewable energies, coal,

electrical and nuclear power, and oil and gas, securing this infrastructure is daunting

and not a straightforward affair. Although some specific considerations may apply to

the broader energy sector, different solutions are still used in the various sub-sectors.

This report essentially looks at the cyber security landscape for the oil and gas

industry. However, references could be used on the broader energy industry.

In this chapter, indications have been provided on the following:

• Understanding Cyber risk in the Oil and Gas

• Cyber incidents targeting the Energy sector

• Growing trends of Cyber-attack

• Why PCN?

2.2 Understanding Cyber risks in the Oil and Gas Industry

According to [7], the oil and gas industry can be referred to as the exploration,

extraction, refining, processing, transport, distribution, and sale of petroleum and gas

products. Petroleum products may include fuel oil and gasoline, while natural gas is a

8

major source of electricity generation [7]. The installations and infrastructure to

support the oil and gas processes are often distributed across geographic areas. This is

often characterized by a high demand for distributed control systems used for remote

monitoring and control capabilities with exchange of real-time data. Although the

energy companies have complex industrial environments, their infrastructures are

underpinned by legacy control systems. These systems are vulnerable, and can be

highly susceptible to cyber-attacks if connected to modern information and

communication technologies (ICT) [7]. In my view, cyber security has become a

growing concern in the past decade with the development of sophisticated malware

targeting critical infrastructure. While the brunt of these threats has primarily focused

on government, military, and financial institutions, the energy sector has not been

spared either. Oil and gas companies have been the target of widespread cyber

infiltration in the past few years, with hostile agents successfully stealing intellectual

property assets and valuable confidential information [8]. Perhaps more

disconcertingly, industrial control systems (ICS) in oil and gas installations are

increasingly coming under siege from cyber-attacks. Consequently, protection of the

energy infrastructure is not only imminent but fundamental. While the security of

physical structures has already been mastered for some time, a rising concern is with

cyber security. The energy industry is not exempt from the increasing connectivity of

modern organizations. Connecting online is not a choice for businesses today; it is a

requirement. According to [9], today’s oil and gas industry has evolved to become a

technologically-complex one. More and more processes are being digitized; data

mining and analytical programs are being used more frequently, and sensors are

everywhere. This may lead to more efficiency, but it also makes systems more

vulnerable to cyber-attacks. As competition in the industry intensifies and the

9

backlash of the economic downturn continues, energy companies are investing in the

latest technology to help cut costs and improve efficiency [7].

Since the advent of Internet of Things (IoT), connectivity seems to make

SCADA & ICS vulnerable to cyber-attacks [10]. According to [10], the growing use

of smart grid technology, more new energy systems are increasingly connected to the

so-called IoT, which in turn opens up new security vulnerabilities due to the sheer

number of connected systems and the low or nonexistent security often placed around

simple devices. Large energy producers and power plants typically employed

Supervisory Control and Data Acquisition (SCADA) into their networks. However,

SCADA seem to be the easiest targets for cyber hackers/terrorists. In the past the

Information Control System (ICS) was isolated from the rest of the world. However,

the advent of advanced versions of Operating Systems (OS) and the internet seems to

have made the information sharing and connectivity a necessary evil [10]. Hackers are

using tools such as ‘Metasploit’, which can assist in hacking anything from a small

webcam to a turbine control system or a tank management system. Phishing emails

are now easily reaching the computers of corporate executives and employees alike.

Human errors are exploited by hackers who are looking at ways and means to hack

and steal sensitive information.

2.3 Cyber incidents targeting the Energy sector

According to [10], most energy companies are most vulnerable to cyber-

attacks during mergers and acquisitions. Mergers and acquisitions require complex

integration of information technology systems that may become susceptible to data

breaches and cyber exposures. Most of the time, cyber security is ignored in a merger

or acquisition due to which the companies involved may become susceptible to data

10

breaches and other cyber risks in future. International law firm Freshfields Bruckhaus

Deringer found in a survey shared with information security that 90% of respondents

believe cyber-breaches would result in a reduction in deal value; and 83% of

dealmakers believe a deal could be abandoned if cyber security breaches are

identified during a deal due diligence or mid-transaction [10]. Dealmakers’ top

concerns include targets suffering cyber-attacks during deal discussions, the target

being a proven victim of data or intellectual property (IP) theft by cyber-attack, and

evidence of a target not handling a past breach effectively (leading to fines, damage to

reputation, etc.). Interestingly, acquirers (30%) are most concerned about cyber

security issues derailing transactions, whereas 81% of sellers are unconcerned or only

slightly concerned about the risk of derailment. One of the biggest threats could be

unauthorized access of critical and proprietary information by malicious insiders and

or outsiders. Data breaches could be another problem since the both companies would

operate large volumes of data and information which must be integrated.

2.3.1 Survey of Cyber Incidents on PCN Infrastructure

According to [11], the North American Electric Reliability Corporation

Critical Infrastructure Protection (NERC CIP) guideline 001-1 considers

“Disturbances or unusual occurrences, suspected or determined to be caused by

sabotage” as reportable incidents. This is necessary to provide useful guidelines for

this research work to determine what could be considered as PCN security incidents.

The following are survey analyses of several critical infrastructure cyber security

incidents that were reported in the energy sector. Each incident has been explained

with description and summary of the root causes. However, it must be noted that not

all the incidents identified were due to external threats.

11

2.3.1.1 Incident 1: GAZPROM pipeline Incident

A gas company in Russia known as Gazprom, a major gas and oil producing

and Transportation Company of Russia, suffered a cyber-attack from hackers in 1991.

According to [12], the attack was collaborated with a Gazprom insider (disgruntled

employee). The disgruntled employee evidently facilitated a group of hackers to gain

access and control of the computer systems of Gazprom [11]. The hackers were said

to have gained control of the central switchboard that controls gas flow in pipelines

by using a Trojan-Horse [11] [12]. A major part of these systems is responsible for

the transportation of gas through several pipelines across Europe. It is noteworthy that

these pipelines are of great importance and as such have been the subject of several

international disputes [11]. Below is the summary of the root cause findings:

• Ineffective or weak malware protection system;

• Interconnectivity between the PCN and the corporate business network;

• Inappropriate firewall rules for filtering of network traffic; and

• Remote access capability to critical PCN infrastructure.

2.3.1.2 Incident 2: Cyber-Attack on Saudi Aramco

On 15 August 2012, the computer network of Saudi Aramco was struck by a

self-replicating virus that infected about 30,000 of its Windows-based computers [13]

Despite its vast resources as Saudi Arabia’s national oil and gas firm, Aramco,

according to [13], took almost two weeks to recover from this incident. According to

[13], viruses frequently appear on the networks of multinational firms, but it is

12

shocking that a cyber-attack of this scale was carried out against a company

infrastructure so critical to global energy markets. The virus was later discovered as

Shamoon, the virus caused considerable disruption to the world’s largest oil producer.

Shamoon is designed to indiscriminately delete critical data from computer

hard drives including the Master Boot Records (MBR), making the computer difficult

to boot up. According to [14], a group known as the “Cutting Sword of Justice” took

credit for the Saudi Aramco attack by posting a Pastebin message on the day of the

attack in 2012, and justified the attack as a measure against the Saudi monarchy.

Although this did not result in an oil spill, explosion or other major fault in Aramco

operations, the attack affected the business processes of the company, and it is

possible that some drilling and production information were also lost in this incident

[13] According to many reports, Shamoon was alleged to have also spread to the

networks of other oil and gas firms, including that of RasGas [14]. The incident

comes after years of advisory and warning about the risk of cyber security attacks

against companies’ critical energy and economic infrastructure.

2.3.1.3 Incident 3: The Stuxnet incident at Iranian Nuclear Plant

In June 2010, an Iranian nuclear control systems facility located at Natanz was

infected with a worm known as Stuxnet [12] Stuxnet is a computer worm designed to

allow hackers to attack industrial plants by changing the code in the systems it

controls. Stuxnet used four ‘zero-day vulnerabilities’ as it does not publicly report or

announce its presence before becoming active, leaving the software’s author with zero

days in which to create patches or advise workarounds to mitigate its actions [15] The

worm exploited Siemens’ default passwords to access Windows operating systems

that run WinCC and PCS7 application programs [11]. The worm is capable of

13

locating frequency-converter drives designed by Fararo Paya in Iran and Vacon in

Finland. These drives are known to be used to power the centrifuges popularly used in

the concentration of the uranium-235 isotope. According to [12], stuxnet distorted the

frequency of the electrical current to the converter drives causing them to oscillate

between high and low speeds for which they were not designed. Consequently, this

switching caused the centrifuges to fail at a higher than normal rate [12].

2.3.1.4 Incident 4: Davis-Besse Nuclear power plant incident

On January 25, 2003, an engineer at the Davis-Besse plant in Ohio used

a virtual private network connection to access the plant from his home [16]. Although,

the connection was encrypted, his home computer was infected with the Slammer

worm that infected the nuclear plant’s computers, causing a key safety control system

to fail for nearly five hours. According to [11], the worm crashed the Safety

Parameter Display System (SPDS) and the Plant Process Computer (PPC). However,

approximately four hours and fifty minutes were required to restore the SPDS and six

hours and nine minutes to restore the PPC [11].

The slammer worm was designed to settle in the system memory and search

for other hosts to infect. Although, the slammer worm carries no malicious payload, it

is still capable of causing extensive disruption. It searches for new hosts by scanning

random IP addresses. This would generate a large volume of spurious traffic,

consuming bandwidth and congesting the networks [16] Below is the summary of the

root cause findings:

• Interconnectivity between the PCN and the corporate business network;

• No firewalls switch between the PCN and the corporate business network; and

14

• Lack of regular windows patch update of machines within the PCN (the patch

to fix the slammer worm had already been available six months prior to fixing

the MSSQL vulnerability that the Slammer worm exploited) [11].

2.4 Growing trends of Cyber-attack

The number of attacks faced by the energy industry is on the rise, according to

a survey by Tripwire [9]. The research revealed that 77% of respondents had seen an

increase in successful cyber-attacks in the past 12 months. According to a similar

survey that was published in an article on the Security Week in September 2015, the

systems of the U.S. department of Energy were breached more than 150 times

between October 2010 and October [17] [18]. Furthermore, in November, a report

revealed that high profile cyber-attacks targeting the oil and gas industry will result in

a growth in security spending from $26.3 billion in 2015 to $33.9 billion by 2020.

The report further highlighted that 82% of oil and gas industry respondents said their

organizations registered an increase in successful cyber-attacks over the past 12

months [18]. Moreover, 53% of the respondents said that the rate of cyber-attacks has

increased between 50% and 100% over the past month [18]. The report further reveals

that the increase in attacks is horizontal across industries, but the data shows that

energy organizations are experiencing an excessively-large increase when compared

to other industries [18]. Given this staggering revelation, energy organizations face

unique challenges in protecting Process Control Systems and SCADA assets. Figure

2.1 below depicts the breakdown of cyber incidents sectors in 2012.

15

Figure 2.1 Cyber Incidents by Sector: Fiscal Year 2012 [19].

According to [9], not only has the number of cyber security attacks increased,

the sophistication has also risen over time. Substantial evidence has shown that cyber

security threats have rapidly escalated since 2008, when the industry saw the first

nation-state attacks [9]. Meanwhile, invaluable assets like bid-lease data, seismic

markups and intellectual property were stolen from very large Oil and Gas companies

in those attacks. Further reports have revealed that the attacks have intensified in the

years since, may be due to the geopolitical ramifications of natural resources, the

propagation of information technology convergence in the field (IoT), and because of

specialized intellectual property that has been created in drilling and production sector

[9]. Figure 2.2 below, depicts the trend of cyber-attack as projected between 1980 and

2010 [11].

16

Figure 2.2 Intruder knowledge versus attack sophistication [11].

The Founder and Principal ICS Security Consultant at Applied Risk re-

emphasized that the growing incidents have demonstrated that critical-infrastructure

companies must shift cyber security higher up the agenda. Take for instance in April

2015, the U.S. Department of Energy warned of the risk of terrorism on ageing energy

infrastructure [17]. A few months later, it was discovered its computer systems have

been the subject of continuous infiltration since 2010 [17] He further stressed that,

over the coming years, incursions by nation states or terrorist adversaries will grow

exponentially as they hit nuclear facilities, power grids and oil and gas pipelines [17].

Clearly, the driving force behind these attacks is economic and strategic gains [17].

An attack campaign against control systems known as Energetic Bear (also

called ‘Crouching Yeti’) is particularly relevant because of its demonstration of how

its attack mechanisms have been commonly used [20]. According to [20] the Russian

security software vendor Kaspersky Lab published an in-depth report that claims that

17

Energetic Bear attacks have successfully exploited more than 2,800 victims including

some 100 organizations in the United States, Spain, Japan, Germany, France, Italy,

Turkey, Ireland, Portland, and China. While Energetic Bear is wide in scope,

researchers at security firm Symantec discovered that as early as March 2014, the

group shifted its focus onto energy firms, with half of the targets in energy and 30%

in energy control systems [20]. Meanwhile, Symantec revealed even more staggering

report that suggests that Energetic Bear attacks against control systems were

successful to the extent that, “could have caused damage or disruption to energy

supplies in affected countries” and those targets included “energy grid operators,

major electricity generation firms, petroleum pipeline operators, and energy industry

industrial control system equipment manufacturers” [20].

2.4.1 Cyber-attack Methodology

There have been several investigations on how cyber-attackers such as those

associated with the Energetic Bear campaign managed to successfully compromise

control systems of so many companies. According to [20], evidence has shown that

the Energetic Bear attacks were conducted using commonly known and easily

executable attack methods against system vulnerabilities that were common

knowledge. The attackers used in many cases, variants of a well-known piece of

malicious software known as the Havex Trojan. Metasploit, a free tool that requires

just about no programming skills to operate was in frequent use as well [20]. Take for

instance, Figure 2.3 shows the attack tree for exploiting PCN or SCADA MODBUS

devices.

18

Figure 2.3 Attack tree for a MODBUS-based SCADA system [21].

According to [21], attack trees are used to assess vulnerabilities in SCADA

and PCN systems based on MODBUS and MODBUS/TCP communication protocols.

An attack tree provides a structured view of events leading to an attack and,

ultimately, helps with the identification of appropriate security countermeasures.

Risk, according to [21], depends on the following:

1. System architecture and conditions;

2. Countermeasures in place;

3. Attack difficulty;

4. Detection probability; and

5. Attack cost.

Figure 2.4 below shows the pictorial steps involved in a cyber-attack.

19

Figure 2.4 An attack step [21].

According to [20], malicious code related with the Energetic Bear attack

campaign was distributed using several primary methodologies including “spear-

phishing” and “waterholing” attacks as well as compromised SCADA software

updates. Spear-phishing as defined by [22], are exploratory attacks carried out by

sending an email with a malicious link or attachment to a targeted list of users. At first

glance, this may sound synonymous to the spam that you receive every day in your

inbox. The important difference to note is that these emails are sent to a very specific

set of individuals that the attackers typically know a good deal about. Consequently,

they can be constructed in a manner that makes them seem much more legitimate than

random spam. By mining social networks for personal information about targets, an

attacker can write emails that are extremely accurate and compelling. Once the target

clicks on a link or opens an attachment, the attacker establishes a foothold in the

network, enabling them to complete their illicit mission. Spear-phishing is the most

common delivery method for advanced persistent threat (APT) attacks [22].

Another technique commonly used by hackers is known as waterholing.

Waterholing refers to when threat actors compromise a carefully-selected website by

20

inserting an exploit resulting in a malware infection. In the case of Energetic Bear,

attackers simply exploited the websites of control system manufacturers where system

updates are maintained. By replacing legitimate updates on these sites with copies that

contained malicious software code, hackers could ensure that their targets would

infect their own systems. It is however important to note that this technique can work

even if the target control system is standalone, that is, systems that are not connected

to any external network.

In fiscal year 2014, ICS-CERT observed greater variety in the characteristics

of the reported incidents as depicted in Figure 2.5 below [23]. Whereas spear-

phishing is still a popular infection vector because of its effectiveness, a wider variety

of techniques was reported this year.

Figure 2.5 Fiscal Year 2014 incidents reported by access vector (245 total) [23].

While ICS-CERT has previously observed strategic watering hole attacks, a

new technique uses trojanized software installers at various vendors’ sites to install

21

malware on the unsuspecting user’s network along with the software update. Many of

the victims were unaware they were compromised. As expected, social engineering

continued to be a popular attack method, enhanced by using social media. In some

cases, this yielded greater success for attackers.

These attack methodologies are consistent with the recently published Cisco

Annual Security Report that reveals that hackers have increasingly shifted their focus

from seeking to compromise servers and operating systems to seeking to exploit

computer users at the browser and email levels [20].

2.4.1.1 Advanced Persistent Threats

According to [24], operational technology relies on obsolete security models

based on unfounded assumptions. Although there have been catastrophic cyber-

attacks on ICSs, a bigger and perhaps even more prominent challenge for owners and

operators of control systems, are viruses, spyware, and malware that migrate from IT

systems to control systems [24]. Viruses are accidentally introduced to control

systems every day through engineers’ laptops, websites, emails, removable drives,

and external computers that for some reason are interconnected with the control

system. These cyber-attacks are more aggravating than a real danger to the system,

but cause delays, shutdowns, and other problems every day. The general scare stems

from catastrophic attacks that may or may not happen. However, the daily struggle are

the viruses and malware, as these often look like software errors, and dealing with this

is costly and causes unplanned downtime [24]. It is often difficult for an engineer to

see the difference between a virus and a software error when the equipment is

malfunctioning. Cyber-attacks lead to increased processor and memory usage on the

attacked host and may cause heat generation, which also can lead to software errors or

22

equipment hardware failure. However, it is often difficult to diagnose the real

problem whenever there is an attack of this nature. For instance, it has only recently

been disclosed that hackers blew up a segment of a Turkish oil pipeline in 2008 [24].

In the control room, the operator’s console showed that everything was nominal

before a phone call from the field triggered the alarm. Furthermore, the dangerous

dimension about this attack was that the attackers also manipulated the CCTV feed to

the control room, covering up what was happening at the site [24]. According to [24],

there are some similarities in the attack methodology when compared with the

Stuxnet incident in 2010, where operator consoles showed normal operations when

the centrifuges of the Iranian Natanz nuclear facility were running at such high speeds

that they were destroyed. It was further alleged that Stuxnet was already resident in

the attacked control system for a few years before the attack took effect [24] The

attack set the Iranian nuclear facility back several years. There are several questions

to suggest why Stuxnet was successful and why an attack of that magnitude was not

detected. According to [24], the Stuxnet incident was recently referred and

categorized to as APT. These attacks have a specific target in mind and are advanced

as they have a high level of coordinated human involvement to monitor and control

the attack using one or more control centers. The persistent part of the attack refers to

the capability of the attack to remain invisible to the target for as long as possible with

priority to complete its mission and get out of the attacked system undetected. APTs

use deep system and attack knowledge to ensure a covert operation. APTs have three

things that the system owner does not have: people, money, and time. The attack

program could be removed if it is discovered or it might also be programmed to

destroy itself. This means that the attack leaves very few traces on the attacked

system.

23

2.4.2 Methods of Operation

According to 12, there are several methods that a perpetrator could exploit a

PCN to carry out an attack. These are summarized below:

• Misuse of Resources: Unauthorized use of IT resources. Excluding storing

unauthorized files on a server, using site as springboard for further

unauthorized activity.

• User Compromise: Perpetrator gains unauthorized use of user privileges on a

host.

• Root Compromise: Perpetrator gains unauthorized administrator privileges

on a host.

• Social Engineering: Gaining unauthorized access to privileged information

through human interaction and targeting people’s minds rather than their

computers.

• Virus: A virus is a piece of code that, when run, will attach itself to other

programs, which will again run when those programs are run.

• Web Compromise: Using vulnerabilities in a website to perform an attack.

• Trojan: A Trojan is a program that adds subversive functionality to an

existing program.

• Worm: A program that propagates itself by attacking other machines and

copying itself to them.

• Recon: Scanning/probing site to see what services are available. Determining

what vulnerabilities exist that may be exploited.

• Denial of Service: An exploit whose purpose is to deny somebody the use of

the service, namely to crash or hang a program or the entire system.

24

Michael Bell, President, CEO and Member of the Board of Directors, Silver

Spring Network asserted in [20] that in dealing with cyber threats to energy systems,

companies not only struggle to assess the risk but also often fail to develop the in-

house tools to understand their own response. He further stressed that, “Everyone is

rushing to adopt technologies but standards need to be used and best practices need to

be implemented” [20]. Similarly, O.H. Dean Oskvig, Vice Chair for North America,

World Energy Council and President and CEO, B&V Energy, noted in [20]. “There

are two types of companies: ones that have been hacked and the other ones that don’t

know they’ve been hacked.” He noted that most energy infrastructure was designed

before modern IT tools and systems. Security to protect this infrastructure tends to

focus on physical defenses at the expense of addressing cyber threats [20].

2.5 Cyber Security Risks Frameworks

It may appear that the likelihood of catastrophic cyber-attacks on SCADA and

PCN systems is comparatively low. This may lead to a false sense of security if we

overlook two key points. Foremost, considering the total number of attacks, it is

worth mentioning that only a small number of cyber security incidents are reported.

According to [21], only a small fraction of actual cyber events occurring are reported

and documented into the traditional business crime reporting database.

Therefore, developing a cyber security risk assessment methodology involves

providing a platform for enterprise-wide cyber security awareness and risk analysis.

According to the National Institute of Standards and Technology (NIST) framework

methodology, risk assessment has two parts, namely conformance assessment and risk

analysis; they must exist to ensure a preventative approach if cyber threats must be

mitigated.

25

The process involved in risk management as highlighted in [21], is depicted in

Figure 2.6. The following are the steps involved in risk management framework:

1. Risk management – This involves coordination of activities to direct and

control an organization regarding risk [21]

2. Risk assessment – This step involves the overall process of risk identification,

risk analysis and risk evaluation [21].

3. Risk identification – The risk identification is the process of finding,

recognizing and describing risks [21]

4. Risk analysis – This process entails comprehending the nature of risk and to

determine the level of risk [21].

5. Risk evaluation – This is the final stage of risk management of comparing the

results of risk analysis with risk criteria to determine whether the risk and its

magnitude are acceptable or tolerable [21].

Figure 2.6 The ISO 31000:2009 risk management process [25].

26

According to [21], a review of the state of the art in risk assessment of

SCADA or PCN systems is urgently required to form a new categorization scheme for

risk assessment methods.

Although there are several risk models and framework that directly attempt to

address the cyber security challenges on the SCADA and PCN environments, each

with a varying degree of effectiveness. In 2004, NIST released a publication

pertaining to the risks and objective of SCADA and PCN systems [21]. Similarly, the

National Infrastructure Security Coordination Center (NISCC) in 2005, a predecessor

of the Centre for the Protection of National Infrastructure (CPNI) in the United

Kingdom, published a best practice guide for firewall deployment in SCADA

networks [21]. In 2007, the U.S. President’s Critical Infrastructure Protection Board

and the Department of Energy published steps an organization must put in place to

improve the security of its SCADA networks. Subsequently, in 2008, NIST released a

comprehensive guidance on a wide range of security issues, and technical, operational

and management security controls [21]. This guide was later updated in 2011. In the

below Figure 2.7, the Generic SCADA hardware architecture is explained.

Figure 2.7 Generic SCADA hardware architecture [21].

27

The NIST Cyber Security Framework for the United States and the

international cyber security framework standard ISO/IEC: 21827 IT-ST - Systems

Security Engineering Capability Maturity Model define core principles for securing

ICSs: These five components of cyber-security philosophy are also the basis for a

defense in depth strategy.

1. Identify: continuous identifying, evaluating and managing of cyber risks

using best practice risk assessment and management methods.

2. Protect: structured and robust built-in security architecture, network perimeter

protection, host protection, network protection, interface protection, and

secure remote connection.

3. Detect: capabilities to detect viruses and other cyber annoyances, as well as

sophisticated cyber-attacks such as APTs, both on the network and inside of

the system and each host.

4. Respond: well-established and efficient processes to handle cyber-attacks.

5. Recover: ability to quickly return to normal or degraded operation after an

attack – the after-the-fact part of defense in depth. There are some cyber-

attacks that are not possible to prevent or respond to. Most often, such cyber-

attacks are APTs and other catastrophic attacks that have a very low

probability of happening and a high impact, should they occur.

28

CHAPTER 3. MACHINE LEARNING: A SOLUTION TO

MITIGATE CYBER THREATS

This chapter examines the use of machine learning (ML) on cyber security. It

further provides the reader with an overview of the vast range of applications where

ML has been adopted. Finally, the report outlines a set of basic yet effective

algorithms that could be used to solve the menace of growing cyber threat.

3.1 Introduction to Machine Learning

According to [26], ML could be defined as a method of data analysis that

automates analytical model building, using algorithms that iteratively learn from data,

which allows computers to find hidden insights without being explicitly programmed

where to look. ML is a form of artificial intelligence (AI) that provides a computer

with the ability to learn by itself without being explicitly programmed [26]. The

process involved in ML is like that of data mining. Although both systems search

through data to look out for patterns, instead of extracting data for human

comprehension, ML uses that data to detect patterns in data and adjust program

actions accordingly. Furthermore, ML focuses on the development of computer

programs that can change when exposed to new data [27].

According to SAS findings, evolution of ML is born from pattern recognition

and the theory that computers can learn without being programmed to perform

specific tasks; researchers interested in artificial intelligence (AI) wanted to observe if

computers could learn from data [26]. ML continues a process of self-training,

because as models are exposed to new data, they can independently adapt [26]. In

short, ML learns from previous computations to produce reliable, repeatable decisions

29

and results. Some examples of widely publicized examples of ML applications are as

follows:

• The self-driving Google car - the reason of ML.

• One of the more obvious, important uses in our world today is the fraud

detection.

The resultant importance and benefits of ML has unlocked the possibilities of

quickly and automatically able to produce models that can analyze bigger, more

complex data and deliver faster, more accurate results, even on a very large scale [26].

Consequently, by building accurate models, an organization has a better prospect of

identifying profitable opportunities or avoiding unknown risks.

According to [26] the following steps are required to create a good ML

system, namely:

• Data preparation capabilities;

• Algorithms – basic and advanced;

• Automation and iterative processes;

• Scalability; and

• Ensemble modeling.

3.2 Basic Machine Learning Methods

There are two most widely adopted ML methods often used globally, they are

supervised learning and unsupervised learning. However, other popular methods

besides these two are semi-supervised learning and reinforcement learning.

30

a. Supervised learning algorithms are trained using labeled examples such as an

input where the desired output is known. For example, a piece of equipment

could have data points labeled either “F” (failed) or “R” (runs). The learning

algorithm receives a set of inputs along with the corresponding correct

outputs, and the algorithm learns by comparing its actual output with correct

outputs to find errors. It then modifies the model accordingly. Through

methods like classification, regression, prediction and gradient boosting,

supervised learning uses patterns to predict the values of the label on

additional unlabeled data. Supervised learning is commonly used in

applications where historical data predict likely future events. For example, it

can anticipate when credit card transactions are likely to be fraudulent or

which insurance customer is likely to file a claim.

b. Unsupervised learning is used against data that has no historical labels. The

system is not told the “right answer.” The algorithm must figure out what is

being shown. The goal is to explore the data and find some structure within.

Unsupervised learning works well on transactional data. For example, it can

identify segments of customers with similar attributes who can then be treated

similarly in marketing campaigns, or it can find the main attributes that

separate customer segments from each other. Popular techniques include self-

organizing maps, nearest-neighbor mapping, k-means clustering and singular

value decomposition. These algorithms are also used to segment text topics,

recommend items and identify data outliers.

c. Semi-supervised learning is used for the same applications as supervised

learning, but it uses both labeled and unlabeled data for training – typically, a

small amount of labeled data with a large amount of unlabeled data (because

31

unlabeled data is less expensive and takes less effort to acquire). This type of

learning can be used with methods such as classification, regression, and

prediction. Semi-supervised learning is useful when the cost associated with

labeling is too high to allow for a fully labeled training process. Early

examples of this include identifying a person’s face on a webcam.

d. Reinforcement learning is often used for robotics, gaming and navigation.

With reinforcement learning, the algorithm discovers through trial and error

which actions yield the greatest rewards. This type of learning has three

primary components: the agent (the learner or decision maker), the

environment (everything the agent interacts with) and actions (what the agent

can do). The objective is for the agent to choose actions that maximize the

expected reward over a given amount of time. The agent will reach the goal

much faster by following a good policy. So, the goal in reinforcement learning

is to learn the best policy.

3.3 Machine Learning in Cyber security

It is unquestionable that within ML and its parent technology AI exists huge

opportunity whose present analytic capabilities could help mitigate literally every

challenge currently witnessed in the digital system today. In recent times, ML has

been hailed as the brand new weapon emerging from the multilayered discipline of

data science to penetrate the sphere of cyber security [28]. The application of ML to

address the growing trend of cyber threats is gaining popularity within the research

industry. According to ABI Research [29], cyber threats are an ever-present danger to

global economies and are projected to surpass the trillion dollar mark in damages

within the next year. In view of this, the cyber security industry is investing greatly in

32

ML to provide a more dynamic prevention approach [29]. Furthermore, ABI Research

forecasts that ML in cyber security will boost big data, intelligence, and analytics

spending to $96 billion by 2021 [29].

Meanwhile, Dimitrios Pavlakis an Industry Analyst at ABI Research predicted

the current era where AI security revolution will drive ML solutions [29]. It is poised

to emerge as the new norm beyond Security Information and Event Management

(SIEM), and ultimately displace a large portion of traditional antivirus (AV),

heuristics, and signature-based systems within the next five years [29]. Although ABI

Research further reveals that the government and defense, banking, and technology

market sectors to be the primary drivers and adopters of ML technologies [29]. The

energy sectors as usual are slow to adopt any sudden change in technology due to

initial skepticism; but, however, the application of ML in this industry is growing

gradually. Cases where ML has been used within the energy industry are known; from

finding new energy sources, predicting refinery sensor failure to streamlining oil

distribution to make it more efficient and cost-effective, are number of applications

where ML have been documented to be used. Moreover, User and Entity Behavioral

Analytics (UEBA) along with Deep Learning algorithm designs are emerging as the

two most prominent technologies in cyber security offerings, especially in innovative

technology startups [29]. Meanwhile, established AV companies in the industry, such

as Symantec, continue to innovate some of their solutions from highly trained

supervised models to unsupervised and semi-supervised ones in preparation of the

constantly shifting threat variables [29]. According to ABI findings [29], SIEM’s

techniques are expected to be divided altogether and integrated within different

functions of UEBA, unsupervised, and deep learning solutions. Consequently,

signature-based AV systems will be expunged completely and comprise only a

33

subsection of supervised ML models [29]. Meanwhile, enterprise-focused giants such

as IBM are transforming the way enterprises employ ML in every market sector, from

healthcare to enterprise analytics to cyber security [29]. On the other hand,

corporations like Gurucul, Niara, Splunk, StatusToday, Trudera, and Vectra Networks

are attempting to take the lead in innovative applications of UEBA [29]. Given the

rising trend of ML application in cyber security, Pavlakis further concludes that, “the

radical transformation is already underway and is occurring as a response to the

increasingly menacing nature of unknown threats and multiplicity of threat agents”

[29]. He further stressed that, “the proliferation of ML is also causing an explosion of

agile startups, such as JASK, focusing more on SIEM complementary network traffic

analysis and even pioneering application protection such as Sqreen” [29].

While AI technology has certainly been around for some time, data science

aided by an ardent increase in computing power has made an astonishing progress

over the past few years, enabling ML to be used largely in almost every aspect of IT

security [28]. According to [28], ML has found numerous grounds in contemporary

cyber-security applications including:

1. The ability to introduce new capability into enterprise security by

incorporating sophisticated versions of anomaly and fraud detection using

UEBA.

2. Enabling corporation to customize their own data and deliver ways for

innovative monitoring applications (e.g., predicting behaviour for hard-to-

detect vectors such as multi-layered attacks or insider threats).

34

3. Transforming the data mining methods of deriving actionable insights by

considering a larger percentage of already available variables (e.g., data

harvested from network traffic, endpoints, web crawlers, etc.).

4. Leveraging the power of vast repositories such as malware and virus databases

to support existing security domains.

5. Providing a quicker and more accurate platform to assist stretched IT and

security resources that may be time-pressured to combat the rising tide of

cyber-threats.

6. Adding to existing data loss prevention (DLP) strategy as reliable security

protocols capable of self-learning, recognizing patterns and behaviours.

7. Assisting IT security personnel in their daily activities, thereby streamlining

the monitoring and decision-making process.

8. Helping design more accurate predictive models for threats, both inside and

outside the company, and capable of managing more critical attacks.

3.4 Understanding Machine Learning Models

It is useful to arrange the data mining domain into four layers. Figure 3.1

shows the four layers of data mining and ML.

35

Figure 3.1 The four layers of Data mining and Machine learning [30].

The first layer represents the target application. ML can benefit many

applications such as cyber instruction detection, credit rating, etc. The second layer

represents the ML tasks such as Classification, Regression, Clustering, etc. Each ML

task can be attained using various ML models as depicted in the third layer. Similarly,

each model can be induced from the sample data using various learning algorithms.

There are numerous selections of ML algorithms that can provide deep analysis of

sample data within the ML domain; however, it should be noted that they provide

varying results of accuracy based on their individual capabilities. Some known ML

models commonly used are: Neural networks, Decision trees, Random forests,

Associations and sequence discovery, gradient boosting and bagging, Support vector

machines, Nearest-neighbor mapping, k-means clustering, Self-organizing maps, local

search optimization techniques (e.g., genetic algorithms), Expectation maximization,

Multivariate adaptive regression splines, Naïve Bayes, Kernel density estimation,

Principal component analysis, Singular value decomposition, Gaussian mixture

36

models and Sequential covering rule building [26]. Meanwhile, for the benefit of this

report, focus will be restricted to only three of these models, namely Neural networks,

Naïve Bayes and Decision trees models.

3.4.1 Overview of Decision Tree

Generally, decision tree is a popular data model that uses the predictive

modeling approaches used in statistics, data mining and ML [30]. Decision tree can be

used to represent both classifiers and regression models. In this case, where the target

variable can take a finite set of values, it is also referred to as classification trees.

Meanwhile, in the case of decision trees, the target variable can take continuous

values (typically real numbers), also known as regression trees [30]. According to

[30], classification trees are regularly used in applications such as finance, marketing,

engineering and medicine. The classification tree is most valuable as an exploratory

technique. Meanwhile, it does not attempt to substitute existing traditional statistical

methods; however, there are many other techniques that can be used classify or

predict the membership of instances to a predefined set of classes, such as artificial

neural networks [30].

The use of a decision tree is a very popular technique in data mining. In fact,

many researchers attributed the popularity of decision trees to their simplicity and

transparency [30]. Decision trees are self-explanatory; there is no need to be a data

mining expert to follow a certain decision tree. Classification trees are typically

represented graphically as hierarchical structures, which makes them easier to

interpret than other techniques. According to [30], whenever the classification tree

becomes complicated and clumsy, then its graphical representation becomes

ineffective.

37

3.4.2 Overview of Neural Networks

Modern neural networks are non-linear statistical data modeling tools that are

modeled on biological neural networks [31]. Structurally, neural network is modeled

using layers of artificial neurons, or computational units able to receive input and

apply an activation function along with a threshold to determine if messages are

passed along [31]. They are often used to model complex relationships between inputs

and outputs, to find patterns in data, or to capture the statistical structure in an

unknown joint probability distribution between observed variables [31]. According to

[31], neural networks are characterized by containing adaptive weights along paths

between neurons that can be tuned by a learning algorithm that learns from observed

data in order to improve the model. The related algorithms form an integral part of

ML, and can be used in many applications. This technique is mostly accurate with

degree of high performance. Neural networks utilize cost functions to learn the

optimal solution to the problem being solved [31]. This is possible by determining the

best values for all the tunable parameters in the model, where the adaptive neuron

path weights are the primary target, along with algorithm tuning parameters, for

example, the learning rate [31]. This is usually carried out through optimization

techniques such as gradient descent or stochastic gradient descent. The model

architecture and tuning are major components of neural networks that give this

technique a significant performance advantage over other ml models [31]. But at

times, the model can become increasingly complicated, and with increased problem-

solving capabilities by increasing the number of hidden layers, the number of neurons

in any given layer, and or the number of connectors between neurons [31].

38

3.4.3 Overview of Naïve Bayes

Naïve Bayes is a classification method that is based on Bayes’ Theorem that

relies on simple probabilistic assumption of independence among predictors [32].

This classifier method assumes that the existence of an attribute in a class is

unconnected to the presence of any other feature. Naive Bayes’ model is particularly

useful where very large data sets are required. Besides its simplicity, Naive Bayes in

specific cases has been known to exceed the capability of vastly sophisticated

classification techniques [32], especially when there is a case of high dimensionality

in the input. Although, Naive Bayes has been researched extensively since the 1950s,

It was only introduced under a different name into the text retrieval community in the

early 1960s, and since then it remains a popular technique for text categorization [32].

In the computer science literature and ML, Naive Bayes’ models are recognized under

an array of names, including simple Bayes and independence Bayes. All these names

reference the use of Bayes’ theorem in the classifier’s decision rule, but essentially

Naive Bayes is not a Bayesian method in itself [32]. Naive Bayes’ classifiers are

highly scalable, requiring only several parameters linear in the number of variables or

predictors in a learning problem [32].

Applications of Naive Bayes’ Algorithms

• Real-Time Prediction: Naive Bayes is a very fast and effective learning

classifier. This characteristic is suitable for making accurate predictions in

real time [32].

• Multiclass Prediction: This algorithm is known for its multi-class prediction

attribute. Thus, this classifier is capable predicting the probability of multiple

classes of target variable.

39

• Text classification: Naive Bayes’ classifiers are commonly used in text

classification have superior accuracy and detection rate when compared with

that of other classifiers. Thus, for this reason its application is commonly used

in Spam recognition and filtering. Meanwhile, common application can also

be found in social media analysis as Sentiment Analysis to identify positive

and negative customer sentiments [32].

• Recommendation System: Naive Bayes’ classifier can be used

collaboratively as a filtering system known as Recommendation System,

which leverages on ML and data mining methods to sort out hidden

information and predict whether a user would prefer a given resource or not

[32].

40

CHAPTER 4. EMPIRICAL INVESTIGATION

In the previous chapters, trends involving cyber incidents against SCADA and

PCN were reviewed with the possibility of using ML as mitigation was exploited.

This chapter introduces the research concepts and methodology. The research design

and approaches used therein have also been addressed.

4.1 Introduction

A comprehensive review has been conducted on the impact and trend of cyber

incidents involving the SCADA and PCN within the energy sector. Thereafter, several

ML frameworks were discussed and conceptualized; the focus turns to a practical case

where it is applied. This chapter focuses on the research concept and methodology of

the empirical investigation adopted in achieving the objectives of this thesis.

The methodology of this thesis is explained in the following six steps:

1. Create data sets (using Wireshark network packet analyzer),

2. Feature extraction of data (CSV format),

3. Combine and convert data (Attribute-Relation File Format [ARFF] format),

and

4. Interpretation of results.

a. Verification and Validation

The methodology that is explained in this chapter is depicted in Figure 4.1

below.

41

Create Data Normal Data

Create Data Malicious Data

Feature extraction of

data

Feature extraction of

data Combine and convert Data

Naïve Bayes

Decision Tree

Neural Networks

Interpret Results

Figure 4.1 Based on the generated data ML models are created that can classify

packets.

4.2 Creating the data sets

The concept of generating data required for this research is by using

Wireshark and WEKA softwares. Wireshark is a network analysis tool formerly

referred to as etherreal, which is used to capture network packets in real time and

displays them in human-readable format [33]. Meanwhile, WEKA is an application

that contain sets of ML algorithms, which are usually used for providing solution to

real-world data mining problems [6]. It is developed in Java language and is

compatible on almost any platform [6]. The algorithms can either be applied to a data

set directly or called from an own Java code [6]. Figure 4.2 below shows the

screenshot of Wireshark during data packet capture.

42

Figure 4.2 Wireshark screenshot during packet capture.

Two sets of unique data were required: normal data and malicious data.

Normal data was generated by launching the Wireshark software application and start

capturing the packets in network interface card of the target machine that was used to

connect to the Local Area Network (LAN). Once data capture was in progress, normal

data was generated by normally using the target machine to perform activities such as

web-browsing, downloading files, and emailing for a minimum of 10 minutes [34].

Specific filters such as http, ICMP, telnet, etc. were applied to capture different types

of traffic, so that the same training data set could be used in WEKA. Similarly,

malicious data was generated by launching different vulnerability exploits from a

remote machine to the target machine for a minimum duration of 10 minutes.

Meanwhile, a penetration testing software for detecting vulnerability known as

43

Metasploit was used to simulate malicious exploits [35] 1 . The captured packets

(normal data and malicious data) were exported from the Wireshark software as a

CSV delimited file for data mining and analysis via the WEKA ML software.

4.3 Feature extraction of data (CSV format)

It is first important and necessary to gather all the data together into a set of

instances. Preparation for data input for a data mining analysis usually consumes the

bulk of the effort invested in the entire data mining process. Subsequence to data

packets that was captured using the Wireshark and exported as CSV as the WEKA

ML application does not support pcap file. Pcap, which stands for packet capture, is a

proprietary application file that is made up of an application programming interface

(API) for capturing network traffic. Different class categorizations were defined based

on the application the traffic is generated as shown in Table 4.1 below.

Table 4.1 Class Categorization.

Traffic Type Traffic Category

TCP PORT 2869 LAN_COMMS

NBNS NetBios_TCP/IP

HTTP Browser

DB-LSP-DISC Dropbox_Cloud

1 Metasploit is an exploit development framework initiated by H. D. Moore in 2003, which was later

acquired by Rapid7. It is a tool used for the development of exploits and the testing of these exploits on

live targets.

44

ICMP<or= 74BYTES Normal_Ping

UDP PORT 5938 Teamviewer

ICMP > 74BYTES Abnormal_Ping

(HIGH_PAYLOAD)

TCP/UDP PORT 1434 SQL_PAYLOAD_EXEC

UDP PORT 3478 Skype

4.4 Combine and convert data (ARFF format)

WEKA expects the data file to be represented in ARFF file. Before an ML

algorithm can be applied to the captured data, it is required to be converted into an

ARFF format (into the file with arff extension). Consequently, the exported CSV was

imported and converted into arff format using the WEKA arff viewer, so that it can be

readable by the WEKA application. Figure 4.3 below shows an ARFF file for the data

packet information exported from the Wireshark application.

45

Figure 4.3 Combine and convert data (ARFF format).

4.5 Interpretation of results and findings

Although the WEKA ML software offers various techniques and classifiers

for data analysis, this research only focuses on three classifiers selected based on their

unique features and capabilities. This is considered necessary to provide a

comparative and comprehensive assessment of the captured data. Classifiers in

WEKA are the models for predicting nominal or numeric quantities. The learning

algorithms that were used in this research in the analysis of the captured data are:

Naïve Bayes’, Multilayer Perceptrons and Random tree classifiers only. The two sets

of data captured using the Wireshark application are used to train the classifier models

within the WEKA application. Table 4.2 shows the WEKA output results comparing

analysis from the Classifiers models (Naïve Bayes’, Multilayer Perceptrons and

Random tree) used in this research.

46

Table 4.2 WEKA output results – Classifiers comparison analysis.

The results in Table 4.2 showing the analysis of the data captured reveal some

fascinating findings. When the “protocol and length” features were used, the Random

tree model achieved a classification accuracy of 98% when analyzed using a 10-folds

Output

Results

Naïve

Bayes

(Cross

Validation

)

Multilaye

r

Perceptro

n (Cross

Validation

)

Random

Tree

(Cross

Validation

)

Naïve

Bayes

(%

Split)

Multilaye

r

Perceptro

n (% Split)

Rando

m Tree

(%

Spilt)

Classificatio

n Accuracy

(%)

93.58 97.51 98.17 94.62 97.69 98.08

Error Rates

(%) 6.42 2.49 1.84 5.39 2.31 1.92

Kappa

statistic 0.7827 0.9077 0.9335

0.801

2 0.9076 0.9269

Mean

absolute

error

0.1014 0.0485 0.0315 0.092

2 0.0464 0.0328

Root mean

squared

error

0.2239 0.156 0.135 0.203

2 0.1501 0.1363

Relative

absolute

error (%)

35.37 16.92 10.99 32.65 16.41 11.60

Root

relative

squared

error (%)

59.19 41.24 35.68 55.63 41.08 37.32

47

cross-validation and percentage spilt analysis, respectively. Meanwhile, a Multilayer

Perceptron and Naïve Bayes obtained a classification accuracy of 97% and 94%,

respectively, when the analyses were conducted on the same set of captured data.

It was necessary to run the algorithm using the 10-fold cross-validation

technique. This ensures that the predictive ML model was presented an opportunity to

make a prediction for every instance of the data set (with different training folds) and

the presented result representing a summary of those predictions. This means that the

data set is divided into 10 parts: the first nine parts are used to train the algorithm,

whereas the 10th part is used to evaluate the algorithm. This process is repeated

yielding random partitions of the original sample. Finally, the results are again

averaged (or otherwise combined) to produce a single estimation.

4.5.1 Verification and Validation

Verification and validation of the results generated were essential by running

the model analysis using the percentage split technique. The percentage split

technique evaluates the classifier on how accurately it predicts a certain percentage of

the data, which is sampled for testing. In this research, the amount of data sampled

was based on how well the classifier predicted 66% of the tested data.

Furthermore, the confusion matrix as shown in Table 4.3 offers further

evidence to suggest that the classification accuracy of the Random Tree model

provides the most balanced result.

48

Table 4.3 WEKA output results – Confusion matrix analysis.

Table 4.3 contains a chart of actual classes compared with predicted classes.

There were two instances where SAFE detection was classified as NOT_SAFE

detection and only three cases where NOT_SAFE detections were classified as SAFE

detections. A detailed output of the WEKA results is captured in Appendix A, B, C,

D, E and F.

4.6 Using active learning method of ML for Cyber security intrusion detection

for PCN application

By presenting the WEKA ML outcome of each classifier models, it potentially

demonstrates the suitability of adapting the ML framework in mitigating the growing

trend of Cyber-attacks on PCNs, particularly in the Oil and Gas industry.

Although it was impractical in some cases to simulate malicious data

effectively due to lack of actual PCN infrastructure, this may have affected the

Confusion Matrix a (SAFE) b

(NOT_SAFE)

Naïve Bayes

Naïve Bayes (Cross Validation)

601 30 19 113

Naïve Bayes (% Split)

211 8 6 35

Multilayer Perceptron

Multilayer Perceptron (Cross Validation)

631 0 19 113

Multilayer Perceptron (% Split)

219 0 6 35

Random Tree

Random Tree (Cross Validation)

630 1 13 119

Random Tree (% Split)

217 2 3 38

49

detection accuracy of the ML algorithms. While there were some legal constraints

encountered in obtaining the actual PCN data to perform this experiment, the

Wireshark data capture provided an alternative method to demonstrate the potential

use of ML as it generates rules automatically by analyzing ethernet data traffics

similarly transmitted within the PCN environment. Furthermore, it is suggested that

these rules could be implemented by PCN administrators or analysts, on PCN

firewalls or just used as triggers to generate alarms for notifications of potential cyber

security gaps on the PCN environment. Nevertheless, it must be noted that this may

have affected the accuracy of a classifier in this experimentation.

Evidently, this research work has provided the foundation with which future

experimental work can be explored. Moreover, typical application where major

outcomes from this research can be adapted is the use of active learning method of

ML to further enhance data gathering in the detection of cyber intrusion. According to

[36], the concept of active learning has been used with statistically-based learning

architecture. Furthermore, active learning has also found to be used in conjunction

with Support Vector Machines (SVM) to improve the detection accuracy with less-

trained data set [37]. The idea of active learning based on this research is to create a

collaboration between the human and the machine in the process where the machine

tags data points and asks for confirmation from the annotator.

Moreover, [38] adopted a similar framework to identify documents in a large

pool that had certain qualities like those provided as samples. According to [38], the

concept of ML involves an oracle (such as a PCN administrator), which builds a loop

to iteratively improve the accuracy and performance of the classifier. The oracle

(PCN administrator) is important to provide a precise label to some of the instances

50

(data packets) in order to provide more information to the classifier, while the

classifier then updates its database with this information [38].

Foremost, some known instances and labels (data packets) are introduced into

the classifier or the ML model as depicted in Figure 4.4. The classifier will go through

all the data packets without knowing the labels or traffic rules, and determine the

traffic rule for which it is difficult to establish whether traffic is “SAFE” or “NOT

SAFE.” Then, the oracle (PCN Administrator) reads these traffic rules and decides to

accept or reject each of them (in this case acceptance would mean that the traffic is

“SAFE,” whereas rejection would mean that the traffic is “NOT SAFE”).

Figure 4.4 Active Learning Process [38].

Given all the information, the classifier will re-train and regulate itself to be

more suitable to this problem. This loop will resume until the PCN administrator is

satisfied with the performance of the classifier. Figure 4.5 below depicts the process

flow of how this model functions.

51

Figure 4.5 Flowchart for Active Learning classification [38].

52

CHAPTER 5. CONCLUSION

Chapter 4 presented the findings of ML assessments on ethernet data that

comprises normal and malicious traffics. This chapter concludes the overall research

investigation. The contributions of the thesis are indicated; recommendations and

future research opportunities are identified.

5.1 Conclusion and Recommendations

This chapter concludes the overall research investigation. The contributions of

the thesis are indicated and future research is identified.

This research thesis had two main objectives at its focus.

1. To assess the rising incidents of cyber security exploits on PCNs in the

Petrochemical industry.

2. To develop and formulate a framework that will mitigate these rising

occurrences using a hybrid approach.

Although, the data set used in this research is small, the experimentation

revealed some results on which valuable deductions could be made. The ML analysis

was conducted on a set of data packet containing a pool mix of normal and malicious

data. The analysis showed that the “Traffic_Category” feature is the key attribute of

reference used in the analysis. Meanwhile, by using “protocol and length” features

only, the Random tree model achieved a classification accuracy of 98% when

analyzed using a 10-folds cross-validation and percentage spilt analysis, respectively.

Meanwhile, a Multilayer Perceptron and Naïve Bayes obtained classification

accuracies of 97% and 94%, respectively, when the analyses were conducted on the

53

same set of captured data. Evidently, the outcome is that this research has presented

an opportunity where this can be implemented alongside the traditional PCN risk

standards and conventional firewalls as triggers to dynamically generate alarms for

notifications of potential cyber security gaps within the PCN environment.

Nevertheless, the 2% error margin within the ML classifier is substantial enough to

cause a significant cyber exploit with unimaginable impact. It is however

recommended that this experimental work be further advanced on an actual PCN

environment using the concept of active learning. With this concept, ML is used to

create a collaboration between the human and the machine in the process where the

machine tags data points and asks for confirmation from the PCN Administrator to

further enhance data gathering in the detection of cyber intrusion.

54

APPENDIX. A

NAÏVE BAYES (CROSS VALIDATION)

=== Run information ===

Scheme: WEKA.classifiers.Bayes.NaiveBayes

Relation: Normal Data_Malicious Data(Comma)-

WEKA.filters.unsupervised.attribute.Remove-R1-

WEKA.filters.AllFilter-WEKA.filters.MultiFilter-

FWEKA.filters.AllFilter-

WEKA.filters.unsupervised.attribute.Remove-R7-

WEKA.filters.unsupervised.attribute.Remove-R5-

WEKA.filters.unsupervised.attribute.Remove-R1-2

Instances: 767

Attributes: 3

Protocol

Length

Result

Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

Naive Bayes Classifier

Class

Attribute SAFE NOT_SAFE

(0.83) (0.17)

==================================

Protocol

TCP 378.0 15.0

UDP 41.0 6.0

NBNS 23.0 1.0

HTTP 64.0 1.0

DB-LSP-DISC 33.0 1.0

ICMP 98.0 114.0

[total] 637.0 138.0

55

Length

mean 275.6529 876.0842

std. dev. 372.9078 367.2059

weight sum 631 132

precision 8.2139 8.2139

Time taken to build model: 0.05 seconds

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 714 93.578 %

Incorrectly Classified Instances 49 6.422 %

Kappa statistic 0.7827

Mean absolute error 0.1014

Root mean squared error 0.2239

Relative absolute error 35.3463 %

Root relative squared error 59.1885 %

Total Number of Instances 763

Ignored Class Unknown Instances 4

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC

Area PRC Area Class

0.952 0.144 0.969 0.952 0.961 0.784 0.878 0.934

SAFE

0.856 0.048 0.790 0.856 0.822 0.784 0.888 0.884

NOT_SAFE

Weighted Avg. 0.936 0.127 0.938 0.936 0.937 0.784 0.879 0.925

=== Confusion Matrix ===

a b <-- classified as

601 30 | a = SAFE

19 113 | b = NOT_SAFE

56

APPENDIX. B

NAÏVE BAYES (PERCENTAGE SPLIT)

=== Run information ===

Scheme: WEKA.classifiers.Bayes.NaiveBayes

Relation: Normal Data_Malicious Data(Comma)-

WEKA.filters.unsupervised.attribute.Remove-R1-WEKA.filters.AllFilter-

WEKA.filters.MultiFilter-FWEKA.filters.AllFilter-

WEKA.filters.unsupervised.attribute.Remove-R7-

WEKA.filters.unsupervised.attribute.Remove-R5-

WEKA.filters.unsupervised.attribute.Remove-R1-2

Instances: 767

Attributes: 3

Protocol

Length

Result

Test mode: split 66.0% train, remainder test

=== Classifier model (full training set) ===

Naive Bayes Classifier

Class

Attribute SAFE NOT_SAFE

(0.83) (0.17)

==================================

Protocol

TCP 378.0 15.0

UDP 41.0 6.0

NBNS 23.0 1.0

HTTP 64.0 1.0

DB-LSP-DISC 33.0 1.0

ICMP 98.0 114.0

[total] 637.0 138.0

57

Length

mean 275.6529 876.0842

std. dev. 372.9078 367.2059

weight sum 631 132

precision 8.2139 8.2139

Time taken to build model: 0 seconds

=== Evaluation on test split ===

Time taken to test model on test split: 0 seconds

=== Summary ===

Correctly Classified Instances 246 94.6154 %

Incorrectly Classified Instances 14 5.3846 %

Kappa statistic 0.8012

Mean absolute error 0.0922

Root mean squared error 0.2032

Relative absolute error 32.6475 %

Root relative squared error 55.6315 %

Total Number of Instances 260

Ignored Class Unknown Instances 1

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC

Area PRC Area Class

0.963 0.146 0.972 0.963 0.968 0.802 0.904 0.960

SAFE

0.854 0.037 0.814 0.854 0.833 0.802 0.901 0.876

NOT_SAFE

Weighted Avg. 0.946 0.129 0.947 0.946 0.947 0.802 0.903 0.947

=== Confusion Matrix ===

a b <-- classified as

211 8 | a = SAFE

6 35 | b = NOT_SAFE

58

APPENDIX. C

MULTILAYER PERCEPTRON (CROSS VALIDATION)

=== Run information ===

Scheme: WEKA.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N

500 -V 0 -S 0 -E 20 -H a

Relation: Normal Data_Malicious Data(Comma)-

WEKA.filters.unsupervised.attribute.Remove-R1-WEKA.filters.AllFilter-

WEKA.filters.MultiFilter-FWEKA.filters.AllFilter-

WEKA.filters.unsupervised.attribute.Remove-R7-

WEKA.filters.unsupervised.attribute.Remove-R5-

WEKA.filters.unsupervised.attribute.Remove-R1-2

Instances: 767

Attributes: 3

Protocol

Length

Result

Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

Sigmoid Node 0

Inputs Weights

Threshold -6.701609106073109

Node 2 2.022658388075181

Node 3 3.4115657853982126

Node 4 2.3661383238993885

Node 5 2.4535891235346226

Sigmoid Node 1

Inputs Weights

Threshold 6.706022284568894

Node 2 -2.0664535731177893

Node 3 -3.380110415895844

59

Node 4 -2.3906272856469992

Node 5 -2.4212852245189156

Sigmoid Node 2

Inputs Weights

Threshold 1.2389105849750572

Attrib Protocol=TCP 1.5562765371242953

Attrib Protocol=UDP -1.5475321168186755

Attrib Protocol=NBNS -1.130914466657216

Attrib Protocol=HTTP 0.6447635193787378

Attrib Protocol=DB-LSP-DISC -1.1292570967597721

Attrib Protocol=ICMP -3.4330801543448985

Attrib Length -4.958940391477846

Sigmoid Node 3

Inputs Weights

Threshold 1.4384081759499916

Attrib Protocol=TCP 1.7757717668743795

Attrib Protocol=UDP -1.5234010753801517

Attrib Protocol=NBNS -1.3467652742850982

Attrib Protocol=HTTP 0.7816564846484115

Attrib Protocol=DB-LSP-DISC -1.265188338070933

Attrib Protocol=ICMP -4.150582386758663

Attrib Length -5.911288725560784

Sigmoid Node 4

Inputs Weights

Threshold 1.3317486924991702

Attrib Protocol=TCP 1.6211890650197724

Attrib Protocol=UDP -1.4922782883420325

Attrib Protocol=NBNS -1.177109463050558

Attrib Protocol=HTTP 0.6804947543129244

Attrib Protocol=DB-LSP-DISC -1.1857817570997022

Attrib Protocol=ICMP -3.630809553738711

Attrib Length -5.222928884763834

Sigmoid Node 5

Inputs Weights

60

Threshold 1.2946085326798238

Attrib Protocol=TCP 1.6498330921541253

Attrib Protocol=UDP -1.5552136751666403

Attrib Protocol=NBNS -1.2520708647725358

Attrib Protocol=HTTP 0.7043422292652405

Attrib Protocol=DB-LSP-DISC -1.1237095075576766

Attrib Protocol=ICMP -3.6403819042448466

Attrib Length -5.261345235018875

Class SAFE

Input

Node 0

Class NOT_SAFE

Input

Node 1

Time taken to build model: 0.55 seconds

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 744 97.5098 %

Incorrectly Classified Instances 19 2.4902 %

Kappa statistic 0.9077

Mean absolute error 0.0485

Root mean squared error 0.156

Relative absolute error 16.9239 %

Root relative squared error 41.2421 %

Total Number of Instances 763

Ignored Class Unknown Instances 4

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC

Area Class

1.000 0.144 0.971 1.000 0.985 0.912 0.879 0.938 SAFE

0.856 0.000 1.000 0.856 0.922 0.912 0.888 0.883

NOT_SAFE

Weighted Avg. 0.975 0.119 0.976 0.975 0.974 0.912 0.881 0.929

61

=== Confusion Matrix ===

a b <-- classified as

631 0 | a = SAFE

19 113 | b = NOT_SAFE

62

APPENDIX. D

MULTILAYER PERCEPTRON (PERCENTAGE SPLIT)

=== Run information ===

Scheme: WEKA.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N

500 -V 0 -S 0 -E 20 -H a

Relation: Normal Data_Malicious Data(Comma)-

WEKA.filters.unsupervised.attribute.Remove-R1-WEKA.filters.AllFilter-

WEKA.filters.MultiFilter-FWEKA.filters.AllFilter-

WEKA.filters.unsupervised.attribute.Remove-R7-

WEKA.filters.unsupervised.attribute.Remove-R5-

WEKA.filters.unsupervised.attribute.Remove-R1-2

Instances: 767

Attributes: 3

Protocol

Length

Result

Test mode: split 66.0% train, remainder test

=== Classifier model (full training set) ===

Sigmoid Node 0

Inputs Weights

Threshold -6.701609106073109

Node 2 2.022658388075181

Node 3 3.4115657853982126

Node 4 2.3661383238993885

Node 5 2.4535891235346226

Sigmoid Node 1

Inputs Weights

Threshold 6.706022284568894

Node 2 -2.0664535731177893

Node 3 -3.380110415895844

Node 4 -2.3906272856469992

63

Node 5 -2.4212852245189156

Sigmoid Node 2

Inputs Weights

Threshold 1.2389105849750572

Attrib Protocol=TCP 1.5562765371242953

Attrib Protocol=UDP -1.5475321168186755

Attrib Protocol=NBNS -1.130914466657216

Attrib Protocol=HTTP 0.6447635193787378

Attrib Protocol=DB-LSP-DISC -1.1292570967597721

Attrib Protocol=ICMP -3.4330801543448985

Attrib Length -4.958940391477846

Sigmoid Node 3

Inputs Weights

Threshold 1.4384081759499916

Attrib Protocol=TCP 1.7757717668743795

Attrib Protocol=UDP -1.5234010753801517

Attrib Protocol=NBNS -1.3467652742850982

Attrib Protocol=HTTP 0.7816564846484115

Attrib Protocol=DB-LSP-DISC -1.265188338070933

Attrib Protocol=ICMP -4.150582386758663

Attrib Length -5.911288725560784

Sigmoid Node 4

Inputs Weights

Threshold 1.3317486924991702

Attrib Protocol=TCP 1.6211890650197724

Attrib Protocol=UDP -1.4922782883420325

Attrib Protocol=NBNS -1.177109463050558

Attrib Protocol=HTTP 0.6804947543129244

Attrib Protocol=DB-LSP-DISC -1.1857817570997022

Attrib Protocol=ICMP -3.630809553738711

Attrib Length -5.222928884763834

Sigmoid Node 5

Inputs Weights

Threshold 1.2946085326798238

64

Attrib Protocol=TCP 1.6498330921541253

Attrib Protocol=UDP -1.5552136751666403

Attrib Protocol=NBNS -1.2520708647725358

Attrib Protocol=HTTP 0.7043422292652405

Attrib Protocol=DB-LSP-DISC -1.1237095075576766

Attrib Protocol=ICMP -3.6403819042448466

Attrib Length -5.261345235018875

Class SAFE

Input

Node 0

Class NOT_SAFE

Input

Node 1

Time taken to build model: 0.55 seconds

=== Evaluation on test split ===

Time taken to test model on test split: 0 seconds

=== Summary ===

Correctly Classified Instances 254 97.6923 %

Incorrectly Classified Instances 6 2.3077 %

Kappa statistic 0.9076

Mean absolute error 0.0464

Root mean squared error 0.1501

Relative absolute error 16.4131 %

Root relative squared error 41.0833 %

Total Number of Instances 260

Ignored Class Unknown Instances 1

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC

Area Class

1.000 0.146 0.973 1.000 0.986 0.912 0.904 0.960 SAFE

0.854 0.000 1.000 0.854 0.921 0.912 0.901 0.877

NOT_SAFE

Weighted Avg. 0.977 0.123 0.978 0.977 0.976 0.912 0.903 0.947

65

=== Confusion Matrix ===

a b <-- classified as

219 0 | a = SAFE

6 35 | b = NOT_SAFE

66

APPENDIX. E

RANDOM TREE (CROSS VALIDATION)

=== Run information ===

Scheme: WEKA.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1

Relation: Normal Data_Malicious Data(Comma)-

WEKA.filters.unsupervised.attribute.Remove-R1-WEKA.filters.AllFilter-

WEKA.filters.MultiFilter-FWEKA.filters.AllFilter-

WEKA.filters.unsupervised.attribute.Remove-R7-

WEKA.filters.unsupervised.attribute.Remove-R5-

WEKA.filters.unsupervised.attribute.Remove-R1-2

Instances: 767

Attributes: 3

Protocol

Length

Result

Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

RandomTree

==========

Length < 797

| Length < 67

| | Length < 48.5 : NOT_SAFE (2/0)

| | Length >= 48.5

| | | Length < 56.5 : SAFE (22/0)

| | | Length >= 56.5

| | | | Length < 59

| | | | | Protocol = TCP : NOT_SAFE (1/0)

| | | | | Protocol = UDP : SAFE (1/0)

| | | | | Protocol = NBNS : SAFE (0/0)

| | | | | Protocol = HTTP : SAFE (0/0)

| | | | | Protocol = DB-LSP-DISC : SAFE (0/0)

67

| | | | | Protocol = ICMP : SAFE (0/0)

| | | | Length >= 59

| | | | | Protocol = TCP

| | | | | | Length < 61 : SAFE (92/7)

| | | | | | Length >= 61

| | | | | | | Length < 64 : NOT_SAFE (2/0)

| | | | | | | Length >= 64 : SAFE (48/4)

| | | | | Protocol = UDP : SAFE (2/1)

| | | | | Protocol = NBNS : SAFE (0/0)

| | | | | Protocol = HTTP : SAFE (0/0)

| | | | | Protocol = DB-LSP-DISC : SAFE (0/0)

| | | | | Protocol = ICMP : SAFE (0/0)

| Length >= 67

| | Protocol = TCP : SAFE (181/0)

| | Protocol = UDP

| | | Length < 206

| | | | Length < 164 : SAFE (13/0)

| | | | Length >= 164 : NOT_SAFE (2/0)

| | | Length >= 206 : SAFE (25/0)

| | Protocol = NBNS : SAFE (22/0)

| | Protocol = HTTP : SAFE (41/0)

| | Protocol = DB-LSP-DISC : SAFE (32/0)

| | Protocol = ICMP : SAFE (97/0)

Length >= 797

| Protocol = TCP : SAFE (46/0)

| Protocol = UDP : SAFE (0/0)

| Protocol = NBNS : SAFE (0/0)

| Protocol = HTTP : SAFE (21/0)

| Protocol = DB-LSP-DISC : SAFE (0/0)

| Protocol = ICMP : NOT_SAFE (113/0)

Size of the tree : 43

Time taken to build model: 0 seconds

=== Stratified cross-validation ===

=== Summary ===

68

Correctly Classified Instances 749 98.1651 %

Incorrectly Classified Instances 14 1.8349 %

Kappa statistic 0.9335

Mean absolute error 0.0315

Root mean squared error 0.135

Relative absolute error 10.986 %

Root relative squared error 35.6796 %

Total Number of Instances 763

Ignored Class Unknown Instances 4

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC

Area Class

0.998 0.098 0.980 0.998 0.989 0.935 0.963 0.987 SAFE

0.902 0.002 0.992 0.902 0.944 0.935 0.975 0.933

NOT_SAFE

Weighted Avg. 0.982 0.082 0.982 0.982 0.981 0.935 0.965 0.977

=== Confusion Matrix ===

a b <-- classified as

630 1 | a = SAFE

13 119 | b = NOT_SAFE

69

APPENDIX. F

RANDOM TREE (PERCENTAGE SPLIT)

=== Run information ===

Scheme: WEKA.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1

Relation: Normal Data_Malicious Data(Comma)-

WEKA.filters.unsupervised.attribute.Remove-R1-WEKA.filters.AllFilter-

WEKA.filters.MultiFilter-FWEKA.filters.AllFilter-

WEKA.filters.unsupervised.attribute.Remove-R7-

WEKA.filters.unsupervised.attribute.Remove-R5-

WEKA.filters.unsupervised.attribute.Remove-R1-2

Instances: 767

Attributes: 3

Protocol

Length

Result

Test mode: split 66.0% train, remainder test

=== Classifier model (full training set) ===

RandomTree

==========

Length < 797

| Length < 67

| | Length < 48.5 : NOT_SAFE (2/0)

| | Length >= 48.5

| | | Length < 56.5 : SAFE (22/0)

| | | Length >= 56.5

| | | | Length < 59

| | | | | Protocol = TCP : NOT_SAFE (1/0)

| | | | | Protocol = UDP : SAFE (1/0)

| | | | | Protocol = NBNS : SAFE (0/0)

| | | | | Protocol = HTTP : SAFE (0/0)

| | | | | Protocol = DB-LSP-DISC : SAFE (0/0)

70

| | | | | Protocol = ICMP : SAFE (0/0)

| | | | Length >= 59

| | | | | Protocol = TCP

| | | | | | Length < 61 : SAFE (92/7)

| | | | | | Length >= 61

| | | | | | | Length < 64 : NOT_SAFE (2/0)

| | | | | | | Length >= 64 : SAFE (48/4)

| | | | | Protocol = UDP : SAFE (2/1)

| | | | | Protocol = NBNS : SAFE (0/0)

| | | | | Protocol = HTTP : SAFE (0/0)

| | | | | Protocol = DB-LSP-DISC : SAFE (0/0)

| | | | | Protocol = ICMP : SAFE (0/0)

| Length >= 67

| | Protocol = TCP : SAFE (181/0)

| | Protocol = UDP

| | | Length < 206

| | | | Length < 164 : SAFE (13/0)

| | | | Length >= 164 : NOT_SAFE (2/0)

| | | Length >= 206 : SAFE (25/0)

| | Protocol = NBNS : SAFE (22/0)

| | Protocol = HTTP : SAFE (41/0)

| | Protocol = DB-LSP-DISC : SAFE (32/0)

| | Protocol = ICMP : SAFE (97/0)

Length >= 797

| Protocol = TCP : SAFE (46/0)

| Protocol = UDP : SAFE (0/0)

| Protocol = NBNS : SAFE (0/0)

| Protocol = HTTP : SAFE (21/0)

| Protocol = DB-LSP-DISC : SAFE (0/0)

| Protocol = ICMP : NOT_SAFE (113/0)

Size of the tree : 43

Time taken to build model: 0 seconds

=== Evaluation on test split ===

Time taken to test model on test split: 0 seconds

71

=== Summary ===

Correctly Classified Instances 255 98.0769 %

Incorrectly Classified Instances 5 1.9231 %

Kappa statistic 0.9269

Mean absolute error 0.0328

Root mean squared error 0.1363

Relative absolute error 11.5984 %

Root relative squared error 37.3158 %

Total Number of Instances 260

Ignored Class Unknown Instances 1

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC

Area Class

0.991 0.073 0.986 0.991 0.989 0.927 0.988 0.996 SAFE

0.927 0.009 0.950 0.927 0.938 0.927 0.986 0.899

NOT_SAFE

Weighted Avg. 0.981 0.063 0.981 0.981 0.981 0.927 0.988 0.981

=== Confusion Matrix ===

a b <-- classified as

217 2 | a = SAFE

3 38 | b = NOT_SAFE

72

REFERENCES

[1] (DHS) Department of Homelands Security, February 2016. [Online]. Available: https://www.dhs.gov/cybersecurity-overview.

[2] C. Wueest, “Security Response: Targeted Attacks against the Energy Sector,” CA. USA, 2014.

[3] (ABI) Allied Business Intelligence Research, January 2013. [Online]. Available: https://www.abiresearch.com/press/cyber-attacks-agains.

[4] E. Knapp, “Cyber security in process plants: Recognizing risks, addressing current threats,” 2015.

[5] Parsons, “Cybersecurity threats to the Oil & Gas Industry: Are you at Risk?,” 2015.

[6] Machine Learning Group at the University of Waikato, February 2017. [Online]. Available: http://www.cs.waikato.ac.nz/~ml/weka.

[7] M. Michela and C. Stuart, “Critical Infrastructure Security – Oil and Gas,” 2013.

[8] (ABI) Allied Business Intelligence Research, “PetroSecurity in the Digital Era: Legacy Systems vs. Cyber Threats,” 2013.

[9] W. Peter. [Online]. Available: https://www.eniday.com/en/sparks_en/cyber-threat-oil-and-gas-industry/.

[10] H. Abhiram, “Cyber Risk for Energy/Power Industry,” AON Risk Solution, India, 2016.

[11] S. Peerlkamp and M. Nieuwenhuis, “Process Control Network Security: Comparing frameworks to mitigate the specific threats to Process Control Network,” Amsterdam, 2010.

[12] M. Bill and R. Dale, “A Survey of SCADA and Critical Infrastructure Incidents,” in Proceedings of the 1st Annual conference on Research in information technology, Utah, 2012.

73

[13] B. Christopher and T.-R. Eneken, “The Cyber Attack on Saudi Aramco,” Survival: Global Politics and Strategy April–May 2013, vol. 55, pp. 81-96, April 2013.

[14] R. Costin, A. H. Mohamad, B. Sergey and M. Sergey, “From Shamoon to StoneDrill: Wipers attacking Saudi organizations and beyond,” 2017.

[15] A. Kiyuna and L. Conyers, CYBERWARFARE SOURCEBOOK, 1st Edition ed., Lulu, 2015.

[16] K. Brent, “The Vulnerability of Nuclear Facilities to Cyber Attack,” Strategic Insights, vol. 10, no. 1, p. 25, 2011.

[17] K. Eduard, “Industry Reactions to U.S. Department of Energy Cyberattacks: Feedback Friday,” 2015.

[18] SecurityWeek, “Oil and Gas Industry Increasingly Hit by Cyber-Attacks: Report,” 2016.

[19] The Industrial Control Systems Cyber Emergency Response Team (ICS-CERT), 2012. [Online]. Available: https://ics-cert.us-cert.gov.

[20] S. Chris, “Hacking oil and gas control systems: Understanding the cyber risk,” 2015.

[21] C. Yulia, B. Pete, B. Andrew, E. Peter, J. Kevin, S. Hugh and S. Kristan, “A review of cyber security risk assessment methods for SCADA systems,” Computers & Security, vol. 56, p. 1–27, February 2016.

[22] FireEye, “SPEAR-PHISHING ATTACKS: WHY THEY ARE SUCCESSFUL AND HOW TO STOP THEM,” California, 2016.

[23] The Industrial Control Systems Cyber Emergency Response Team (ICS-CERT), 2014. [Online]. Available: https://ics-cert.us-cert.gov.

[24] S. H. Houmb, “Protecting industrial control systems,” 2015.

74

[25] L. John, “ISO 31000: Risk Management - A practical guide for SMEs,” International Organization for Standardization, Geneva, 2015.

[26] SAS Institute Inc, April 2017. [Online]. Available: https://www.sas.com/en_us/ insights/analytics/machine-learning.html.

[27] R. Margaret, February 2017. [Online]. Available: http://whatis.techtarget.com/ definition/machine-learning.

[28] P. Dimitrios and M. Michela, “MACHINE LEARNING IN CYBERSECURITY TECHNOLOGIES,” United Kingdom, 2017.

[29] (ABI) Allied Business Intelligence Research, “Machine Learning in Cybersecurity to Boost Big Data, Intelligence, and Analytics Spending to $96 Billion by 2021,” (ABI) Allied Business Intelligence Research, January 2017.

[30] R. Lior and M. Oded, Data Mining with Decision Trees: Theory and Applications, 2nd ed., A. Yun, Ed., Toh Tuck Link: World Scientific Publishing, 2015.

[31] C. Alex, “Artificial Intelligence, Deep Learning, and Neural Networks Explained,” 2016.

[32] Wikipedia, May 2017. [Online]. Available: https://en.wikipedia.org/wiki/ Naive_Bayes_classifier.

[33] H. Chris, 2014. [Online]. Available: www.howtogeek.com.

[34] V. Roland, “Creating firewall rules with machine learning techniques,” Nijmegen.

[35] R. Karthik, “Instant Metasploit Starter,” in The art of ethical hacking made easy with Metasploit, India, Packt Publishing Limited, 2013, p. 52.

[36] C. David, G. Zoubin and J. Michael, "Active Learning with Statistical Models,” Journal of Artifical Intelligence Research, vol. 4, no. 1996, pp. 129-145, 1996.

[37] S. Greg and C. David, “Less is More: Active Learning with Support Vector Machines,” Proceedings of the Seventeenth International Conference on Machine Learning, no. 17, pp. 839-846, 2000.

75

[38] G. Xin, “Active Learning SVM for Blogs recommendation,” George Mason University, Virginia, 2013.

[39] W. Ian H and F. Eibe, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed., Elsevier, Ed., San Francisco: Morgan Kaufmann Publishers, 2005.

76