Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
The use of Big Data Analytics to protect
Critical Information Infrastructures from
Cyber-attacks
Thomas Oseku-Afful
Information Security, masters level
2016
Luleå University of Technology
Department of Computer Science, Electrical and Space Engineering
Abstract
Unfortunately, cyber-attacks, which are the consequence of our increasing dependence on
digital technology, is a phenomenon that we have to live with today. As technology becomes
more advanced and complex, so have the types of malware that are used in these cyber-
attacks. Currently, targeted cyber-attacks directed at CIIs such as financial institutions and
telecom companies are on the rise. A particular group of malware known as APTs, which are
used for targeted attacks, are very difficult to detect and prevent due to their sophisticated
and stealthy nature. These malwares are able to attack and wreak havoc (in the targeted
system) within a matter of seconds; this is very worrying because traditional cyber security
defence systems cannot handle these attacks. The solution, as proposed by some in the
industry, is the use of BDA systems. However, whilst it appears that BDA has achieved greater
success at large companies, little is known about success at smaller companies. Also, there is
scarcity of research addressing how BDA is deployed for the purpose of detecting and
preventing cyber-attacks on CII. This research examines and discusses the effectiveness of the
use of BDA for detecting cyber-attacks and also describes how such a system is deployed. To
establish the effectiveness of using a BDA, a survey by questionnaire was conducted. The
target audience of the survey were large corporations that were likely to use such systems for
cyber security. The research concludes that a BDA system is indeed a powerful and effective
tool, and currently the best method for protecting CIIs against the range of stealthy cyber-
attacks. Also, a description of how such a system is deployed is abstracted into a model of
meaningful practice.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 1
Table of Contents
1.0 Introduction ...................................................................................................................................... 6
2.0 Background theories and concepts ................................................................................................. 9
2.1 CII versus CI ................................................................................................................................... 9
2.2 Interdependency .......................................................................................................................... 9
2.3 Big Data ....................................................................................................................................... 10
2.4 Big data analytics........................................................................................................................ 12
2.5 Big data technologies ................................................................................................................. 13
2.6 The cyber-attack landscape ....................................................................................................... 15
2.7 Threat detection with BDA ........................................................................................................ 16
3.0 The Literature review ..................................................................................................................... 20
3.1 Scope ........................................................................................................................................... 20
3.2 Conceptualisation of research topic .......................................................................................... 20
3.3 The review method .................................................................................................................... 20
3.4 Review analysis and synthesis ................................................................................................... 23
3.4.1 Part I: big data analytics for cyber security .................................................................. 23
3.4.2 Part II: models for protecting CIIs ................................................................................ 26
3.4.3 Part III: big data analytics for cyber security in CII ........................................................ 28
3.5 Literature review conclusion ..................................................................................................... 29
3.5.1 Research gaps ............................................................................................................ 30
3.5.2 Research question(s) .................................................................................................. 30
4.0 Research Methodology .................................................................................................................. 31
4.1 Justification................................................................................................................................. 31
4.2 Questionnaire design ................................................................................................................. 32
4.2.1 Target audience and sample size ................................................................................ 35
4.2.2 Data collection ........................................................................................................... 36
5.0 The Results ..................................................................................................................................... 38
5.1 Analysis ....................................................................................................................................... 38
5.2 The model ................................................................................................................................... 54
6.0 Conclusion ...................................................................................................................................... 56
6.1 Research limitations ................................................................................................................... 57
6.2 Future research .......................................................................................................................... 57
7.0 References ...................................................................................................................................... 59
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 2
Acknowledgements
I will like to thank Ahmed Elragal, my supervisor, for his expert feedback, patience and advice.
This thesis would not have been possible without his astute guidance.
I have thoroughly enjoyed the entire course of information security and this is down to the
experience and expert knowledge of all the lecturers but especially, Devinder Thapa, Dan
Harnesk, Tero Paivarinta and Todd Booth. Please keep inspiring your students with your wit,
intelligence and experience.
Mum, your interest and questioning about the various topics about this course, as well as
your encouragement kept me going. Thanks so much for your support.
To my loving and lovely wife Nana, you know I could not have done this without your tireless
support and encouragement. You are my inspiration.
And to Reuben my dear son, this work dedicated to you and your baby brother.
Your Grace and Mercy is indeed what has brought me through. Thank you LORD!
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 3
Acronyms and abbreviations
AI
Artificial Intelligence
APT
Advanced Persistent Threats
BDA
Big Data Analytics
CI
Critical Infrastructure
CII
Critical Information Infrastructure
DiD
Defence in Depth
DOJ
Department of Justice
FIM
File Integrity monitoring
HDFS
Hadoop Distributed File System
IDS
Intrusion Detection System
IDPS
Intrusion Detection and Prevention System
IPS
Intrusion Prevention System
IoT
Internet of Things
SIEM Security Information and Event Management
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 4
List of tables and figures
Tables
Table 1 - journal database search ......................................................................................................... 21
Table 2 - selected research papers for review ...................................................................................... 22
Table 3 - parameters of the questionnaire with justification ............................................................... 35
Table 4 - Frequency analysis for Q1 ...................................................................................................... 38
Table 5 - Frequency analysis for Q2 ...................................................................................................... 39
Table 6 - Frequency analysis for Q3 ...................................................................................................... 40
Table 7 - Frequency analysis for Q4 ...................................................................................................... 41
Table 8 - Frequency analysis for Q5 ...................................................................................................... 42
Table 9 - Frequency analysis for Q6 ...................................................................................................... 43
Table 10 - Frequency analysis for Q7 .................................................................................................... 44
Table 11 - Frequency analysis for Q8 .................................................................................................... 45
Table 12 - Frequency analysis for Q9 .................................................................................................... 46
Table 13 - Frequency analysis for Q10 .................................................................................................. 47
Table 14 - Frequency analysis for Q11 .................................................................................................. 48
Table 15 - Frequency analysis for Q12 .................................................................................................. 49
Table 16 - Frequency analysis for Q13 .................................................................................................. 50
Table 17 - Frequency analysis for Q14 .................................................................................................. 51
Table 18 - Frequency analysis for Q15 .................................................................................................. 52
Table 19 - Frequency analysis for Q16 .................................................................................................. 53
Figures
Figure 1 - the three V's of big data compared to traditional data ........................................................ 11
Figure 2 - big data classification into five categories ............................................................................ 12
Figure 3 - 'mining' valuable information using a big data technology .................................................. 14
Figure 4 - method 1: making existing detection system intelligent with big data ................................ 17
Figure 5 - method 2: using internal and external data for the analytic process ................................... 18
Figure 6 - method 3: streaming data from external sources for the analytics ..................................... 18
Figure 7 - mathematical equation for calculating margin of error (ME) .............................................. 36
Figure 8 - mathematical equation for calculating sample size n, derived from the ME equation ....... 36
Figure 9 – a derived model showing how a BDA security system can be deployed ............................. 55
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 5
Charts
Chart 1: this chart shows the percentage distribution of the responses to question 1. ...................... 39
Chart 2: this chart shows the percentage distribution of the responses to question 2. ...................... 40
Chart 3: this chart shows the percentage distribution of the responses to question 3. ...................... 41
Chart 4: this chart shows the percentage distribution of the responses to question 4. ...................... 42
Chart 5: this chart shows the percentage distribution of the responses to question 5. ...................... 43
Chart 6: this chart shows the percentage distribution of the responses to question 6. ...................... 44
Chart 7: this chart shows the percentage distribution of the responses to question 7. ...................... 45
Chart 8: this chart shows the percentage distribution of the responses to question 8. ...................... 46
Chart 9: this chart shows the percentage distribution of the responses to question 9. ...................... 47
Chart 10: this chart shows the percentage distribution of the responses to question 10. .................. 48
Chart 11: this chart shows the percentage distribution of the responses to question 11. .................. 49
Chart 12: this chart shows the percentage distribution of the responses to question 12. .................. 50
Chart 13: this chart shows the percentage distribution of the responses to question 13. .................. 51
Chart 14: this chart shows the percentage distribution of the responses to question 14. .................. 52
Chart 15: this chart shows the percentage distribution of the responses to question 15. .................. 53
Chart 16: this chart shows the percentage distribution of the responses to question 16. .................. 54
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 6
1.0 Introduction
In 2014, the initial discovery of a successful cyber-attack against a major US bank, JPMorgan
Chase, sent waves of fear and worry across Wall Street and triggered an investigation by the
FBI (Silver-Greenberg, Goldstein and Perlroth, 2016). In the news article, it was mentioned
that the hack-attack, which was launched from overseas, was not discovered until after a
month had passed; this was in spite of the fact that the bank had at that time fortified its
defences and employed the services of a top cyber security firm. Also, the article mentioned
that given the level of sophistication of the attack, and the apparent lack of profit (no
customer financial information was stolen) from the hackers, it is likely to have been
sponsored by a government; as a matter of fact, law enforcement officials and cyber security
experts believed that it may have been sponsored by elements of a particular foreign country.
This story raises the question of why the attack caused the whole of Wall Street to worry?
An answer to this can be found in another news article by Rushton (2014). Quoting statements
from the superintendent of the New York State Department of Financial services, and the
head of policy of the City of London, Rushton discusses the possibility of a cyber-attack
capable of triggering a global financial crisis. She paints a picture of a national bank
“disappearing” as a result of customer savings being wiped off, and accounts being deleted in
the event of a cyber-attack. Examining the JPMorgan hack-attack, it is quite startling to see
that such a scenario (as described by Rushton) could have easily been the case. It is not hard
to imagine that an attack (from another country) such as this (where a national bank is
brought to its knees) will not only cause a financial crisis but it will also be an act of war.
Perhaps, this is the reason why the country that launched the attack refrained from causing
further harm.
Earlier this year, the telecoms company (BT) that owns and maintains the physical
infrastructure that makes up the UK’s broadband network, experienced an outage of a part
of its broadband services which resulted in hundreds of thousands of customers (including
businesses) losing connection to the Internet as well as phone connections for about two
hours (Williams, 2016). According to the news article, this was the biggest and most
widespread network failure in years. Although the company denies it, and blames this
network failure on a faulty router, it has been suggested that it was the result of a cyber-
attack. Regardless of whether BT’s reason (of a faulty router) for the outage is true or not, the
point is that it is clearly possible to launch a cyber-attack that can take down an organisation’s
infrastructure. Suppose this was indeed a cyber-attack, and it affected more routers and
endured for days and not hours; can one imagine the negative impact it would have had on
the nation’s economy, not to mention the lives it might affect, as emergency services lose
communication?
In both of these scenarios (the JPMorgan and BT cases), the operations of the organisations
are so intertwined with other organisations – in their respective nations – that their failure
will inevitably trigger a domino effect causing these other organisations also to fail. This
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 7
makes the running of the technology infrastructure that supports these organisations critical.
Therefore, the protection of such infrastructures, commonly referred to as critical
information infrastructures, are considered a matter of national security.
Today, to say that cyber-attacks are widespread is a major understatement. With society’s
increasing dependence on technology, and the advent of the Internet of Things (IoT), cyber-
attacks will get worse and remain an unfortunate consequence of human activities. Just as
human viruses and bacteria are organisms that exist with humans, so are malware to
computer systems. And just as there is a concerted effort to handle the threat of drug
resistant bacteria, so the cyber security community must strive to find solutions to the
increasing threat of the dangerous malware that are being propagated in Cybersphere.
In the JPMorgan hack-attack, it took about a whole month for the attack to be detected. This
problem is not an isolated one. In the highly publicised Sony cyber-attack, which happened in
the same year as the JPMorgan case, it also took a while for the breach to be discovered. In
fact, it is still unclear as to how long it took, but some sources say it may have lasted for as
long as a year (Zetter, 2014). Another high profile attack, that took quite a while for the
organisation to realise, is the hacking of the retail giant Target. The remarkable element about
this cyber-attack is that although the attack was detected by the security detection systems
the organisation had in place, the personnel somehow missed it and only reacted when they
were alerted by the Department of Justice (Finkle and Heavy, 2014).
The fact that these attacks were not discovered until it was too late shows the level of
sophistication of the malware out there. According to a Sophos Security Threat Report
(Eschelbeck, 2014), modern malwares have evolved into something more sinister,
sophisticated and stealthier such as those used for Advanced Persistent Threats (APTs)
attacks; and these form of attacks are on the rise in recent years. The report also mentions
that APTs are perpetrated by highly knowledgeable, skilful and motivated individuals who are
very organized and well-funded and therefore, it is becoming increasingly harder to detect
and defend against such attacks.
Another report points out that cyber-attacks against critical infrastructure systems are on the
rise, and in some cases the objective is to destroy rather than steal data (Trend Micro, 2015).
Although the report focuses on the Americas, a case can be made for the fact that this is a
global phenomenon. The report also mentions that the nature of the malware used in these
attacks shows a noticeable increase in the knowledge of critical infrastructure systems
(especially SCADA systems). In other words, the cyber criminals are becoming more
knowledgeable and sophisticated, and this shows in the lethal malware they are using to
attack organisations. Also, the attacks are directed against specific critical information
organisations.
So how does one defend against such attacks? Could big data analytics (BDA) be the answer?
In a press release in 2014, a prediction was made that by 2016, large corporations will turn to
big data analytics for cyber detection and defence (Rivera, 2014). In the news article, it is
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 8
mentioned that big data analytics will allow the organisations that use it to see a bigger and
clearer picture of threats, and therefore be able to detect and prevent these attacks from
happening. Also, it is mentioned (in the press release) that the technology (i.e. big data
analytics) is still in its infancy, and its effectiveness has barely been proven.
Hence, in this research, a study of the effectiveness of big data analytics – for cyber-attack
detection – will be carried out. This will be done by examining the success rate (via a survey
by questionnaire) in using the technology for the detection of sophisticated and stealthy
cyber-attacks such as APTs. Given the fact that stealthy malware is designed to be undetected
and an attack can compromise a computer system in a matter of seconds (Brewer, 2015), the
word “effectiveness” is defined as: having a speed of detection that is within seconds, minutes
or hours, but no more than a day as a day might be too late; able to detect stealth attacks
significantly more often than not - at least 75% success rate.
Also, a model of meaningful practice of how BDA is deployed (for cyber security) will be
developed. This model will be based on how various organisations have used big data
analytics to achieve success in detecting and quelling cyber-attacks. In other words, questions
about the kind of big data analytics operation these organisations use for the detection of
stealthy attacks will be asked: perhaps the result of their success is because the big data
analytics tool is combined with the organisation’s traditional cyber security detection
systems; or maybe the traditional systems are completely replaced with the big data analytics
tool.
The main contribution of this research is to shed more light, howbeit little, into the relatively
complex area of applying big data analytics in cyber security. In addition, this model of
meaningful practice can serve as an initial guide (of how a BDA system can be deployed) for
new organisations that wish to embrace this technology for cyber security.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 9
2.0 Background theories and concepts
This section presents a discussion of the concepts, technologies and cyber threats that
underpins this research. Although the study is primarily about the detection and prevention
of cyber-attacks using big data analytics, it specifically addresses how big data analytics can
be used for the protection of critical information infrastructure (CII) organisations from cyber-
attacks. There are different types of cyber threats that these organisations face, but the main
threat of concern is the use of stealthy malware, which are notoriously difficult to detect and
prevent. Therefore, explaining what these background concepts are is necessary and relevant.
2.1 CII versus CI
A search (in any Journal databases or indeed a search engine such as Google) about critical
information infrastructures (CII) will invariably churn out results about critical infrastructures
(CI) as well. In fact, CIs and SCADA systems dominate the search results. It seems the two
concepts (CI and CII) are treated as one and the same. Therefore, a clarification will be useful.
In other words, what is the difference between a CII and a CI?
To address this apparent confusion of the two concepts, the explanation offered by Cavelty
and Suter (2012) makes sense. They explain that a CI consists of all critical sectors of a nation’s
infrastructure of which a CII is a subset. They describe a critical information infrastructure as
a system which is part of a global or national information infrastructure that is essential for
the continuity of critical services, which includes banking and finance, hospitals and utility
services. There are two sides to it: a physical component side which include equipment such
as high speed networks, satellites, wireless communication networks, televisions, radios,
phones and computers; and an immaterial side which is the information and content that is
stored on and flows through it (the physical component). A typical example of a CII is a
datacentre that serves millions of people and financial system such as a stock market.
This research will focus on CII and not necessarily CI, although it must be mentioned that some
strategies for protecting CIIs will definitely overlap with the protection of CIs as well. It must
also be mentioned that the protection of a CII is considered a matter of national security.
2.2 Interdependency
Given the definition of a critical information infrastructure, it is quite obvious that a CII for
any nation serves an important purpose for that nation’s development, but what is the big
deal if say, a high speed network or a satellite is taken off line by accident or design? After all,
people probably will not die and though it will be expensive, these infrastructures can be
rebuilt, one might say. So what makes their (CII) existence so critical that it must be protected
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 10
as a matter of highest priority? In other words, why is their protection considered a matter of
national security?
To answer this question fully, one must understand the concept of interdependency.
Essentially, this means that critical information infrastructures do not exist in a vacuum but
they are connected to a range of other important infrastructures and services. This means
that a failure in one of them affects the others. Renaldi, Peerenboom and Kelly (2001) explains
this idea of interdependency by looking at the case of the Galaxy 4 telecommunications
satellite failure in 1998, which resulted in a loss of pager service to about 45 million customers
(Rosenbush, 1998). At the time, people (especially in the US) relied on pagers for
communication. Renaldi, Peerenboom and Kelly added that the event caused lots of
disruptions including the disruption of important financial transactions such as card
purchases, and also threatened lives by disrupting communications between doctors and
emergency services. This shows that the failure of one CII can cause a domino effect to other
CIIs and CIs. If the failure of a relatively simple communication device such as a pager can
cause such serious disruptions (including threats to human lives), can one imagine the scale
of the disruptions that can arise if the CII (as in the BT case mentioned in the introduction)
that supports the use of current communication devices such as smartphones, tablets and a
host of other systems is taken down? Even worse, the scale of these potential disruptions
expands exponentially when one factors in emerging technologies such as the Internet of
Things (IoT), which is fostering more dependency on technology, by both individuals and
organisations, for their respective daily activities and operations.
It must be mentioned here that the increasing dependence on the use of technology (by
people and organisations), and the pervasiveness of the IoT have added more fuel to the
phenomenon called Big Data.
2.3 Big Data
According to Cisco, the global IP traffic will exceed 1000 Exabyte (i.e. 1 zettabyte) by the year
2016 (Cisco, 2015). To put the quantity of data being discussed here into perspective, 1
zettabyte of data is the same volume as the Great Wall of China (Arthur, 2011). This explosion
of data is what is referred to as big data.
However, big data is not just about volume. It is also about variety and velocity. Variety refers
to a range of different types or formats of data such as video, audio, images, text messages
and email, as well as sensor and machine generated data. Velocity refers to the speed
(including real time) by which these data are generated, processed and transmitted.
Therefore, though there are other characteristics, big data is essentially characterized by the
so called, “three Vs” – volume, variety and velocity (Gartner, 2012).
Due to the nature of big data, it cannot be easily categorised and organised into a traditional
database. Figure 1 (Cloud Security Alliance, 2013) gives a good illustration of the three Vs of
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 11
big data by contrasting it with data that is traditionally used and processed in relational
database management systems.
Volume:
traditional data
big data
Velocity: traditional data big data
Variety:
traditional data
(homogeneous and
structured)
big data
(heterogeneous and
unstructured)
Figure 1 - the three V's of big data compared to traditional data
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 12
Figure 2 - big data classification into five categories
A further explanation of the nature of big data is presented by Hashem et al., (2015). They
explain that big data can be categorised into five different aspects such as: data sources,
content format, data stores, data staging and data processing. This categorisation is aptly
depicted in a diagram (Hashem et al., 2015), and an adaptation and more visual version of
this diagram is presented in Figure 2.
Today, big data has become very important to organizations because of the wealth of
information that can be gleaned from it. The “mining” of big data for relevant information is
referred to as big data analytics.
2.4 Big data analytics
Big data analytics is quickly becoming an indispensable tool in our increasingly digitised
society. It is used not only by big corporations to aid decision making, but for many other
disciplines including artificial intelligence (AI), health related research and information
security.
McAfee and Brnjolfsson (2012) explain the importance of Big data analytics by giving an
example whereby real time location data from users’ smartphone were used to determine
how many shoppers were at Macy’s parking lot on Black Friday at the start of the Christmas
shopping season in the US. This data then allowed analysts to estimate the retailer’s sales
even before the actual sales had been recorded.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 13
Also, combined with machine-learning algorithms, Big data have been used to create artificial
intelligence (AI) systems that are better at performing tasks, which in the past only humans
could do. A typical example is IBM’s Watson, which beat the best minds at the game of
Jeopardy in 2012 (Ferrucci, 2012). Another example of machine learning is driverless cars;
although these cars have not yet surpassed humans, tests (on certain selected roads) show
that these machines have mastered the complex art of driving (Gibbs, 2014).
The point here is that big data analytics can provide a powerful tool for organisations to make
smarter and better decisions because it can give a better picture of any particular event even
before it happens. This makes Big data analytics a perfect and potent tool for the detection
cyber-attacks. Tankard (2012) suggested how this application - of using big data analytics to
detect cyber-attacks - can be done when he discussed the advantages of big data. He explains
that organizations can mine the huge amounts of data they have been collecting for potential
cyber security events such as malware and phishing attempts.
To conduct big data analytics, there are a range of technologies that are used. The general
consensus amongst most experts in this field is that the whole big data phenomenon is early
days, and it seems this is supported by the raft of changing technologies – including storage
applications, machine-learning algorithms for analytics and user interfaces – that are
appearing on the market today.
2.5 Big data technologies
In previous sections, it was established that, like any form of analytics, big data analytics
involves the careful examination of the (big) data in order to obtain meaningful and useful
information. However, given its nature, it is obvious that any kind of examination of big data
will neither be easy nor straight forward. Therefore, innovative solutions have been required
to make the mining of meaningful information from big data less challenging and as easy as
possible. Given the rise in the amount of technologies (for big data analytics) on the market
today, it seems a lot of progress is being made.
In this section, the general structure of a big data technology or system is described. Just as
big data is complex and diverse, so are its current technologies. Elgandy and Elragal (2014)
discusses a framework – dubbed B-DAD – which makes it easier for one to understand the
general structure of big data technologies. This framework consists of three main areas, and
these are: storage and architecture; data and analytic processing; and the analyses (results).
Like Elgandy and Elragal (2014), Hu et al. (2014) also use a similar three-layered architecture
model to describe big data systems, which consists of an infrastructure layer, a computing
layer, and an application layer. The infrastructure layer consists of a network of storage
systems enabled by cloud computing and virtualization. In other words, the infrastructure
layer consists of a system of distributed hardware that are used to store the big data. This
means that the data might not necessarily be found in one particular location or system, but
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 14
spread across multiple servers in different locations. The computing layer, also known as the
middleware layer with respect to this three-layered model, consists of the software tools that
are used for the management and integration of the data. The Application layer consists of
the application or software that is used for implementing the required data analytics. A more
graphical variation of the illustration used by Hu et al. (2014), to describe their model, is
presented in Figure 3.
Figure 3 - 'mining' valuable information using a big data technology
Although the infrastructure layer is indispensable and big data analytics is possible today
because of advancements in storage capacity and micro-processors, big data technologies
usually comprise of the computing and application layer. At the Application layer, a big data
system can be classified into two main groups; one may recall that one of the five categories
of big data (as shown in Figure 2) is about data processing of which there are two main types,
batch and real time (or stream). Therefore, based on these two types of processing, big data
technologies are grouped into batch and stream. Batch processing is when the analytics is
performed on data at rest; and stream processing is when the analytics is carried out on data
in motion (Cloud Security Alliance, 2013).
Like any traditional batch processing system, time is not necessarily of essence in batch
analytics. Thus, in batch analytics, data is stored over a period of time before it is analysed. A
typical example of a batch processing big data technology is Hadoop (Cloud Security Alliance,
2013). Hadoop (or Apache Hadoop software library, as it is formally called) is a framework
that enables the processing of large volumes of data across clusters of distributed systems in
a simple manner. It can operate on a single server or scale horizontally to cover thousands of
servers, with each of them providing their own local storage and processing. The framework
or ecosystem consists of four modules including the Hadoop Distributed File System (HDFS)
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 15
and Hadoop MapReduce. HDFS is used for the storage and management of application data,
and MapReduce is used for parallel processing of large volumes of data. The other two
modules are Hadoop Common and Hadoop YARN (Hadoop.Apache.org, 2016).
Other technologies that take advantage of the Hadoop framework to perform different tasks
include Pig, Mahout, Spark, Hive, HBase and Cassandra. Pig is a high-level language used for
performing data analysis; Mahout is used for generating machine learning and data mining
algorithms; the key functionalities of Spark are machine-learning, stream processing and
graph computation; and like HDFS, Hive, HBase and Cassandra are used for storing large
volumes of data (Hadoop.Apache.org, 2016).
Unlike batch analytics, time is absolutely important in stream analytics. This is based on the
idea that the value of certain data (such as instant messages, for example) is bound by its
velocity. Hence, in stream analytics, data is analysed as soon as it arrives. Thus, another name
for stream analytics is real-time. Stream analytics is usually used for online applications where
data arrives within seconds and/or milliseconds (Hu et al., 2014). One of the most talked
about technologies for stream analytics is Apache Storm. It is an open source real-time
computation system, which is easy to use, reliable, and compatible with any programming
language (Storm.Apache.org, 2015). Other big data technologies for stream analytics are
Apache Spark and Apache Kafka, which are quickly becoming the analytics engine of choice
for most organisations.
It must be mentioned here that a key problem of big data technologies, that still exists, is the
pre-processing stage, where the data in its raw state is transformed into a more
understandable format, using techniques such as cleansing, integration, transformation and
reduction. Therefore, the effectiveness of a big data technology can be determined by the
pre-processing techniques it uses (Tasi et al., 2015).
2.6 The cyber-attack landscape
During a speech at a cyber security conference in 2012, the FBI Director made quite an
unusual statement, “I am convinced that there are only two types of companies: those that
have been hacked, and those that will be” (Mueller, 2012). Unfortunately, this is the sad
reality in today’s digitised world. Cyber-attacks have become so mundane that organisations
are not even surprised when it happens. According to a security report (Internet Security
Threat Report, 2016), more than 430 million new pieces of malware were discovered in 2015,
and what was even more remarkable about this is that this finding came as no surprise to the
researchers. The security report goes on to explain that targeted, sophisticated and persistent
attacks against government organisations and businesses of all sizes are on the rise and pose
a serious threat to national security and economy.
Cyber-attacks come in different shapes and forms – from common viruses to highly
sophisticated malware such as cyber-weapons. van Kessel and Allan (2014) provide a list of
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 16
the most likely sources of cyber-attacks and it includes the usual suspects of disgruntled
employees, hacktivists and criminal syndicates. In the information security field, it is almost
taken for granted, with good reason and evidence, that internal threat (by employee
activities) pose the most threat (or risk) to the security of information in any organisation, but
this report (van Kessel and Allan, 2014) makes a startling statement which suggests that this
is changing. The report explains that the combined activities of external attackers are now
significantly more likely as a threat source than internal threats.
External attackers include criminal syndicates, state sponsored hackers, hacktivists and lone-
wolf hackers. The modus operandi of these attackers include the use of highly sophisticated
malware, which cannot be easily detected even with equally highly sophisticated security
systems (van Kessel and Allan, 2014). For example, in the Sony attack, it was reported that
the malware used could have slipped past most of the network defences out there today (Lee,
2014).
Since the advent of Stuxnet – a rather insidious and stealth malware that was used to attack
the Iranian nuclear facility – similar types of such malware have emerged in Cybersphere.
Known examples of these types of stealthy malware, apart from Stuxnet, include Duqu, Flame
and Red October, and are collectively called APTs (Virvilis and Gritzalis, 2013).
Virvilis and Gritzalis (2013) describe an APT as having the following general features: they are
usually directed at specific and high-valued targets and therefore, for particular operating
systems or platform; they usually have an initial attack vector such as malicious office
documents or removable drives; they are equipped with a list of evasion techniques in order
to bypass anti-virus software and intrusion detection systems (IDS) using command and
control techniques; part of their evasive techniques include encryption of their network
traffic; they use stolen but legitimate digital certificates, which fools the targeted systems as
being safe.
These features (of an APT) make it very difficult for even hardened and sophisticated systems
to detect them. Thus their detection relies heavily on conducting manual investigations and
the expertise of human analysts (Cloud Security Alliance, 2013).
2.7 Threat detection with BDA
Traditionally, the range of systems for detecting and preventing cyber-attacks can be grouped
as follows: antivirus programs; network IDS/IPS; host IDS/IPS; network device events; logging;
FIM and whitelisting; and SIEM. Although these systems are useful in many ways, they are
proving to be largely ineffective against current types of stealthy cyber-attacks. This is
because, apart from operating independently from each other, these systems generate a
huge amount of data which is difficult and time consuming to analyse without the correct
tool; thus it is easy to miss key cyber-attack events (Shackleford, 2016).
This suggests that with the correct deployment of the right tool (such as a BDA technology),
which can sift through the data much quicker, these disparate systems can be made more
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 17
efficient and effective. Laitan (2014) suggests, in an example, that using a big data analytics
system, an organisation that employed about 35 staff to monitor 135,000 data loss prevention
(DLP) alerts per day, managed to reduce it considerably using a handful of staff.
So how does an organisation detect cyber-attacks or threats using a BDA system? (Laitan,
2014) discusses three main approaches – of deploying a big data analytics tool for cyber
security – that are based on the source of the data and the analytics setting (canned or ad
hoc). An adaptation of the illustration used for explaining these approaches or methods, as
presented in the report, is shown in figures 4, 5 and 6.
The first method, as described in the report (Laitan, 2014), involves making existing systems
– such as SIEM, DLP and DAP – more intelligent and less noisy so that only the most dangerous
cyber-attacks (e.g. APTs) are flagged and isolated. This means that the analytics setting of the
big data system used is usually canned. Also, the data to be used will be derived internally
from the organisation’s databases, servers and applications. This method is illustrated in
figure 4a.
Figure 4 - method 1: making existing detection system intelligent with big data
In the second method, the data (for the analytics) is sourced from internal and external
sources (such as online and mobile activities), and the analytics setting is customised (or ad
hoc). This means the organisations can set their own search criteria, and in some of the big
data analytics systems used, searches for malicious activities can be performed ‘in google-like
fashion’. This method is illustrated in figure 5.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 18
Figure 5 - method 2: using internal and external data for the analytic process
Figure 6 - method 3: streaming data from external sources for the analytics
In the third method (see figure 6), the analytics is performed mainly on external data about
threats and the activities of various bad actors. This means that the big data analytics system
is designed to comb through the Internet (both dark and public) for malicious activities against
organisations. Perhaps this is the reason why in the Target cyber-attack (see section 1), the
DOJ – using systems that search the Internet for cyber security events - were able to
determine the cyber-attack before Target became aware of it.
In essence, a big data technology for cyber security can be described as having the following
key features (Sullivan, 2016):
• It must have the ability to scale, as smoothly as possible, to accommodate the increasing
size of the security data being collected (from both internal and external sources) without
losing performance in its functionalities. This means that the analytics engine must be able
to handle the data as it scales horizontally across distributed storage systems. Also, the
storage systems it uses must be persistent with low data latency. In other words, the
database must be capable of keeping copies of the original data even after it has been
modified, and data access must be quick.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 19
• It must have a reporting and visualization function which will allow the information (after
the analytics – canned or ad hoc) to be presented in a way that will be useful and
meaningful to security analysts.
• The source of the data (for the analytics) must be in context. In other words, analysing
weather data for cyber-attack events might not be a good idea. Using just any data might
result in higher than necessary false positives, or even worse, it might result in false
negatives. This means that the source of the data for the analytics is of absolute
importance.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 20
3.0 The Literature review
In order to ensure that the review of literature for this research is rigorous and thorough, a
systematic approach was adopted; this is the framework of literature review proposed by
Vom Brocke et al. (2009). The framework suggests that a rigorous literature review should
consist of the following phases: scope definition; conceptualisation of the topic; literature
search; literature analysis and synthesis; and conclusion.
3.1 Scope
The scope for this research is based on the relevant topics associated with the study, and the
time frame for the selection of past research papers. The relevant topics are identified as “big
data analytics”, “cyber-attacks”, and “critical information infrastructure protection”; and the
time period is determined to be spanning the past five years. The choice of the time span (of
the past “five years”) is due to the fact that one of the key topics, big data analytics, is a
relatively new phenomenon and therefore, it is not likely for one to find any study (beyond
this period) that will be relevant (to this research).
3.2 Conceptualisation of research topic
Considering the research topic, the key terms for the basis for this research are identified to
be: big data analytics for cyber-attacks detection; models or methods for critical information
infrastructure protection; and using big data analytics for the detection of cyber-attacks in
critical information infrastructures.
3.3 The review method
The main approach to the literature search was keyword searches, using the key terms
identified during the “conceptualisation” phase, in a range of selected but relevant journal
databases including Elsevier Science Direct, IEE Explore, Emerald Journals and ACM Digital. In
other words, these key phrases were typed directly into these journal databases. The result
of this search is summarised in Table 1. One can observe (from Table 1) that this approach
alone did not yield enough relevant results. Therefore, to support this approach, public
databases such as Google Scholar proved to be very useful in that, given the relevant
keywords, it returned a list of research papers and their corresponding journals; this made it
easier to identify where to obtain the research paper of interest. For example, entering the
keywords, “Big data analytics for cyber security”, in Google Scholar yielded a list of research
papers accompanied by a brief description. This allowed for a quick read about the contents
of each paper, and then after selecting a paper of interest (eg. “Big Data Analytics for cyber
security: a review of trends, techniques and tools”), the relevant journal database (IEEE) was
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 21
used to obtain it (when it was not readily available in Google Scholar). After obtaining the
particular paper, a backward and forward search was then conducted in order to obtain other
relevant paper(s).
Database Keywords Fields Hits Reviewed
IEEE Explore Big data analytics for cyber security and/or
cyber-attack detection
Any 20 1
All 0 0
Models or methods for protecting critical
information infrastructures
Any 0 0
All 0 0
Using big data analytics for the detection
of cyber-attacks in critical information
infrastructures
Any 40 1
All 0 0
Elsevier
Science Direct
Big data analytics for cyber security and/or
cyber-attack detection
Any 75 2
All 0 0
Models or methods for protecting critical
information infrastructures
Any 150 0
All 0 0
Using big data analytics for the detection
of cyber-attacks in critical information
infrastructures
Any 110 3
All 0 0
Emerald
Journals
Big data analytics for cyber security and/or
cyber-attack detection
Any 1 0
All 0 0
Models or methods for protecting critical
information infrastructures
Any >1000 0
All 0 0
Using big data analytics for the detection
of cyber-attacks in critical information
infrastructures
Any 0 0
All 0 0
ACM Digital Big data analytics for cyber security and/or
cyber-attack detection
Any 0 0
All 0 0
Models or methods for protecting critical
information infrastructures
Any 3 0
All 0 0
Using big data analytics for the detection
of cyber-attacks in critical information
infrastructures
Any 0 0
All 0 0
Table 1 - journal database search
Note: the review column (in Table 1) does not mean that only those number of papers were
examined. It simply indicates the number of papers (from the respective databases searches)
that were reviewed for this research. The final list of the relevant research papers for review
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 22
are presented in Table 2. They are grouped according to the concepts discussed in this
research.
Papers Concepts
A B C
Ahn, Kim and Chung (2013) x x
Gandomi and Haider (2014) x
Ghazal et al (2012) x
Constantine (2014) x
Hurst, Merabti and Fergus (2014) x
Dunn-Cavelty and Suter (2009) x
Mouton and Ellefsen (2013) x
Curry et al (2013) x
Tsai et al (2015) x
Laitan (2014) x
Ulltveit-Moe (2013) x
McLaughlin et al (2014) x
Ma, Smith and Skopik (2013) x
Tsegaye and Flowerday (2014) x
Cardenas, Manadhata and Rajan (2013) x
Ragupathi and Ragupathi (2014) x
Everett (2015) x
Tankard (2012) x
Chen, Chiang and Storey (2012) x
Slavakis, Giannakis and Mateos (2014) x x
Mahmood and Afzal (2013) x
Kambatla et al (2014) x
Puri and Dukatz (2015) x
Hipgrave (2013) x
Tannahill and Jamshidi (2013) x
Smith and Watson (2013) x x
Tasi et al (2015) x
Aniello et al (2014) x
Table 2 - selected research papers for review
Table 2 Key:
Concept A: big data analytics for cyber security and/or cyber-attack detection
Concept B: models or methods for protecting critical information infrastructures
Concept C: using big data analytics for the detection of cyber-attacks in critical information
infrastructures.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 23
3.4 Review analysis and synthesis
This literature analysis focuses on the relevance of big data analytics to information security,
how it has been applied for the detection of cyber-attacks in information systems in general,
and in CIIs in particular. It is based on the three concepts (A, B, C) outlined in Table 1.
3.4.1 Part I: big data analytics for cyber security
The Information Security Forum (ISF) compiled a report about the potential positive impact
and improvement big data analytics can have on information security. The report states that
organisations need to move away from reacting to cyber-attack incidents and move towards
detecting and preventing such incidents (Information Security Form, 2012). The report
concludes that though current big data analytics can be used to improve information security
by reducing risks and increasing agility, the technology is not quite mature in the information
security industry.
Tankard (2012) agrees with the observations made by the ISF (2012) and in his article about
big data and security he explains that, of the many advantages of big data analytics, the most
compelling is operational efficiency (for commercial organisations). One can argue that this
“operational efficiency” includes cyber-attack detection because keeping a system secure is
part of an organisation’s operations. He goes on to explain that apart from commercial
organisations, big data analytics can be useful to governments for the detection of threats
from foreign countries, terrorists, hacktivists and criminal elements in the real world and in
cyberspace. Essentially, Tankard’s argument is that because the information obtained from
big data analytics is of high value (to the organisation that has obtained it), it will inevitably
be a target for cyber-attacks, but the big data analytics itself can also be used to prevent such
attacks. In other words, big data analytics can be used by organisations to increase
productivity and at the same time make their systems more secure. He adds that, for big data
analytics to be effective, the security access controls should be moved away from the network
perimeter and closer to the data asset that needs protection.
Like Tankard (2012), Cardenas, Manadhata and Rajan (2013) extols the use of big data
analytics, but they focus more on its uses for cyber security. They explain that the idea of data
analysis for cyber-attack detection is not new in that the information security community
have been monitoring network traffic, and analysing system logs and other sources of data in
order to detect threats and malicious activities for more than a decade, but the use of big
data analytics is better and has overcome the many challenges that faced the traditional data
analysis (for security) of monitoring network traffic, security logs, etc. One of these challenges
is the inability to perform long term and large scale analytics because it was not economically
feasible to keep large volumes of data for a long period. They explain that one of the main
impacts of big data technologies is the facilitation of the development of affordable
infrastructures – such as storage and maintenance - for security monitoring by various
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 24
industries, thus making it possible for large scale analytics to be carried out. However, they
argue that despite the significant promise of big data analytics for security, there are several
challenges, such as privacy laws, that can prevent this development from realising its true
potential if not addressed. They also caution that big data analytics is not a panacea for cyber
security and therefore, security specialists will have to continue researching new ways to curb
sophisticated attacks.
Examining the potential of big data analytics, Curry et al. (2013) predicted that it will change
the status quo of most information security products (i.e. network monitoring,
authentication, fraud detection, IDS, etc.) and will evolve to have advanced predictive and
real time features. They state that just as big data analytics have transformed the competitive
dynamics of commercial organisations, it will also make the information security sector
better. They explain that big data analytics is especially relevant because a system better than
the traditional ones is needed to defend against cyber attackers who are becoming more
sophisticated and are able to stage highly targeted and complex attacks. Also, attack surfaces
are becoming broader and more diverse as organisations dissolve network boundaries by
allowing data and application access through cloud services and mobile devices. Therefore,
they argue that only a system that is more agile and uses dynamic risk assessments and
analysis of big data will be enough to handle this situation.
Discussing the time taken for cyber-attacks to happen and the response times by
organisations, Brewer (2015) echoes this idea, by Curry et al. (2013), that the network
perimeter (for organisations) is becoming wider. Brewer goes as far as explaining that the
network perimeter does not exist at all in the current climate of computing, and therefore
insisting on blocking attacks at the perimeter is bound to fail. He also adds that the nature of
the cyber-attacks is stealthy and happens very rapidly (in minutes and seconds), making it
difficult to be prevented. He explains that many organisations operate in a mode where it
takes weeks or months to detect cyber-attacks. Therefore, he proposes a “fundamental shift”
from prevention to detection by using big data analytics to detect and respond to cyber-
attacks quickly (i.e. as they happen).
Mahmood and Afzal (2013) are so convinced that big data analytics is the solution to the
growing threat of cyber-attacks faced by organisations that, not only do they encourage
organisations to embrace it, they prescribe how it can be implemented. They, like Cardenas,
Manadhata and Rajan (2013), argue that the traditional security solutions are simply not good
enough, especially in this era of big data (where data can arrive and disappear in seconds) and
the ever increasing attack skills of cyber criminals. They explain that for a big data analytics
solution to be effective, the data (for the analytics) will have to be pooled from diverse
sources; the analytics engine must be sophisticated and cutting edge; and the user interface
must be interactive.
Ahn, Kim and Chung (2013) discuss the reason why traditional security solutions are simply
not good enough, and explain that these (traditional) security solutions, such as firewalls, anti-
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 25
viruses, intrusion detection and prevention systems, were based on signatures and
characteristics of known malware. Therefore, new malware (especially APTs) which had no
signatures were impossible to detect using these systems. To overcome this problem, they
propose a big data analytics-based model. Essentially, this model (they propose) consists of
three main features: data collection; data processing or analytics; and an alert and report
service. The data collection feature works by pooling data from traditional sources (including
existing databases and security logs) into one place. This data is then mined, using current
data mining techniques, for attack or abnormal behaviours; if any of these are detected, the
result is passed on to the alert and report service. The problem one may find with this solution
is that the data pool is still traditional – it does not seem to be “big”. Apart from using smart
analytics, a key aspect of big data analytics is that the data must be “big”; this means that the
data must be as diverse as it is extensive and also include ephemeral data.
Laitan (2014) makes the case for the use of big data analytics by explaining that the value of
information needed to detect cyber-attacks is time dependent and in certain situations, a
second might be too long. Therefore, a system such as big data analytics that can provide
faster access to an organisation’s own information as well as relevant external information is
much desired. She explains that in the past, cyber-attackers would conduct a careful and
extensive reconnaissance of a target before attacking it, but they are not doing that anymore
because of the stiff resistance that has been mounted by most organizations. So to overcome
these resistances, the cyber-attackers attack directly without studying their target for too
long. Also, cyber-attackers are becoming more knowledgeable and there are more of them
attacking organisations.
To deal with these threats, organisations have in the past used several isolated monitoring
and detection systems that have been tuned for different scenarios such as user access, data
loss and financial fraud. This (traditional) approach generated a lot of “noise” and false alarms
in the systems. With big data analytics, Laitan (2014) explains that organisations can: reduce
the noise and false alarms in the existing monitoring systems by supplying them with relevant
data and smarter analytics; pool their internal data and relevant external data into one place
so that they can look for known patterns of cyber-attacks and unusual activities; and remain
agile and stay ahead of the cyber-attackers. Laitan adds that despite the benefits of big data
analytics, it is still early days and most organisations do not have the skills and abilities to
adopt the technology.
In her discussion about the benefits big data analytics, Everett (2015) focuses on cyber
security and poses the question as to whether big data analytics is really the future “saviour”
of cyber security as some are saying, or whether it is the latest threat. To answer this question,
she presents quotes from interviews with several key security experts from various
organisations, and the consensus amongst these experts is that big data analytics is indeed
the way forward and not a threat to cyber security. The responses from these experts make
a quite a compelling case for the relevance of big data analytics for the prevention of cyber-
attacks. Everett makes the point (through one of the experts interviewed) that the result of
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 26
an organisation’s data being spread everywhere – from the cloud to personal devices – means
it is difficult to identify where the network perimeter is and how to defend it. Therefore, the
challenge today is more about preventing access to the data itself and not necessarily about
preventing access to the network and the devices on it; this is what Tankard (2012) means,
when he said that the security access controls of big data analytics should be moved away
from the network perimeter and closer to data asset. Everett adds that identifying breaches
to the network using traditional means is proving to be very difficult because determined
cyber-criminals are more likely to be using legitimate user credentials (probably obtained
through social engineering), which is why big data analytics is of immense benefit. Everett
continues and explains that the benefit of big data is, it is not only capable of detecting threats
and behavioural inconsistencies in real time, but it could be used in the future to obtain the
required intelligence to trigger automatic responses to cyber-attacks. However, she adds that
despite the huge potential of big data analytics, it remains the choice of only a small number
of large organisations with a sophisticated security posture and until it is packaged as a
commodity product like anti-virus programs, smaller firms are unlikely to take it up. Like
Cardenas, Manadhata and Rajan (2013), Everett cautions that big data analytics must not be
seen as a silver bullet because the technology is still in its infancy and the expertise or
knowledge in this field is currently limited. A reason for this could be the fact that
organisations that are already using big data analytics are unwilling to share their experiences
because they are worried it might expose their vulnerabilities, she says.
Although the main objective of the paper by Verma et al. (2015) is about why cyber security
professionals and students should study data analytics, they raise some very important and
relevant points about the use of big data analytics for cyber security and believe that it is the
right tool needed to handle the cyber security challenges of today. They make a similar point,
to Curry et al (2013), about the cleverness of current cyber-attackers and the sophistication
of the malware being released. They explain that malicious adversaries who are clever at
hiding their attacks are the difference between applying big data analytics and traditional data
analytics. In other words, traditional data analytics are simply not enough to handle these
attackers. Another point they make, which is in resonance with Laitan (2014), is that the speed
an attacks can be very quick (“fraction of a second”) and so the defender will need to react
quickly, and therefore, systems (such as big data analytics) that can deal with such time scales
are needed.
3.4.2 Part II: models for protecting CIIs
With regards to protecting CIIs, Ulltveit-Moe et al (2013) proposes the concept of information
sharing and best practices, between computer emergency response teams in Europe, as
proposed by the EU – the so called European Information Sharing and Alert System (EISAS).
They argue that the sharing of security information such as cyber-attacks and vulnerabilities
between these organisations will not only improve the security of the respective
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 27
organisations, but will also reduce the cost of protecting one’s own system. They identify
some challenges to this approach (of information sharing) – such as trust, lack of awareness,
and lack of standards to enforce the privacy of information – and propose methods of how to
address these issues. The method they propose to overcome these challenges is based on the
PRECYSE (Prevention, Protection and Reaction to cyber-attacks to Critical Infrastructures)
project, which is essentially the development of a methodology, architecture and latest
technology and tools (PRECYSE, 2012). To be clear, Ulltveit-Moe et al (2013) are not proposing
PRECYSE as a method for protecting CIIs (although, this is what the PRECYSE project is all
about), but rather adapting its principles as a way of overcoming the challenges (of their
solution) of information sharing. They rightly point out that traditional methods of protecting
of CIIs, against cyber-attacks such as APTs, have been ineffective, and therefore these
(traditional) security systems must be handled as being fundamentally insecure. However,
their proposal involves quite a bit of traditional approaches. For example, they propose the
use of “privacy metrics” in Intrusion Detection Systems (IDS) to reduce the occurrence of false
alarms, but it seems these IDSs are based on traditional data sources.
Another approach to protecting CIIs, which is also based on the PRECYSE project, is offered
by McLaughlin et al. (2014). They make a rather interesting claim that, ultimately the
information security posture of CII organisations (especially small-to-medium size ones) will
be based on money and not necessarily on how powerful or sophisticated the types of cyber-
attacks that the organisations face. They explain that given the current types of attacks (APTs
such as Stuxnet, Duqu, etc) experienced by organisations, it is only a matter of time that an
“all powerful” adversary will defeat security systems no matter how secure. In other words,
having a “defence-at-all-cost” system does not make a good business case, especially for
small-to-medium size CII organisations. They have rightly identified that the basic strategy
employed by most security systems is perimeter defence, and therefore malicious activities
that have breached the perimeter, and are inside the system, are not likely to be detected.
Hence, the solution they propose, which is essentially an architecture for the detection of
cyber-attacks, consists of an integrated system that includes features such as interior anomaly
detection as well as a perimeter defence, and a countermeasure management system. With
regards to the interior anomaly detection feature, the source of the data to be used for the
detection (of anomalous behaviour) is obtained from both the corporate ICT system (or
network) and the Industrial Control System (ICS). They mention that similar solutions (to what
they are proposing) from commercial organisations employ big data analytics but do not
readily combine data from both the corporate network and ICS systems. From the description
given, it seems that although the data used for security analysis is pooled from the two
systems, they are static and not dynamic. One would not be wrong to think that this makes
their proposed system quite vulnerable, since the speed of an attack can be less than a
second. Also, it seems a lot of manpower and different tools will be required in order to be
successful. Perhaps a big data analytics system (that can pool data from both systems as well
as from external sources) can make such a system more secure in that it will be able to
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 28
monitor dynamic (real-time) as well as static data in much less time and with much less
manpower and resources.
Unlike Ullviet-Moe et al. (2013) and McLaughlin et al. (2014) who discussed technical
solutions for the protection of CIIs, Ma, Smith and Skopik (2013) identifies the security
analysis exercise as a key element for the protection of CIIs. This is highly relevant because
regardless of how sophisticated the security system may be, it will be ineffective if it is not
deployed properly; and to be able to do so, a security or risk analysis exercise is needed to
reveal where the vulnerabilities may be. Therefore, in their research article, Ma, Smith and
Skopik (2013) present a thorough description of a CII in the form of an architectural model.
The idea is that in order to identify vulnerabilities in any system, one must understand how
the various components are connected together. The description they present truly simplifies
the complexity of a CII, and although the model is in relation to an industrial system, it does
not conflict with other standalone CIIs such as banking systems.
Although the main focus of their article is about the protection of power grids from cyber-
attacks such as APTs, the solution presented by Skopik, Friedberg and Fiedler (2014) can be
adopted for other CIIs. They mention that the effort to modernise and digitise existing power
grids in order to make it more efficient has widened the attack surface for these
infrastructures. They explain that the reason why traditional systems, such as IDSs and
security information and event management (SIEM), fail to detect for APT attacks is because
they are based on signatures of known malware whole, but these APTs are designed to exploit
zero-day vulnerabilities. Therefore, the solution they propose avoids signature based
solutions, and relies on statistical analysis of data about system behaviour stored in the
system log files. In other words, data analytics is used in order to uncover unusual behaviour.
Since the sources of the data (to be analysed) is from internal activities, one wonders whether
this will be enough to uncover an attack, since a system can be breached within a second.
The solution and subsequent model for the protection of a CII as proposed by Tsegaye and
Flowerday (2014) are sound but it is based on traditional methods of protecting information
systems, which are currently proving to be ineffective against zero day attacks as observed by
Skopik, Friedberg and Fiedler (2014). The solution consists of three main controls: preventive,
detective and corrective. For the preventive, they describe instruments or tools such as
policies, firewalls, anti-virus software. Detective tools include anti-virus software and IDS, and
the corrective tools also include anti-virus software and a disaster recovery plan. With this
approach, one cannot help but to wonder how the CII will be able to withstand the types of
cyber-attacks organisations are currently facing.
3.4.3 Part III: big data analytics for cyber security in CII
With respect to using big data analytics to protect CIIs, Hurst, Merabti and Fergus (2014)
presents a method which involves the use of behavioural observation and big data analysis.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 29
Although this method is not specifically for a CII, it is indirectly included since CIIs are a subset
of CIs. This method, which is actually a type of technology, is designed to monitor the
behaviour of the CI in terms of its data processing and to detect any abnormalities, which
might be a cyber-attack. The system, which was tested with data sets from a simulator, and
analysed with mathematical classification techniques, allowed the researchers to
demonstrate the effectiveness of the method. The system (or method) they propose does not
present a big data analytics tool for the protection of a CI per se, but rather it is about how
extra observations of anomalous behaviour (in the data operations of a CI) can be performed
using big data analytics techniques in order to add to the defence in depth. They also make
the point that, the nature of the current threats combined with the fact that the traditional
methods of defence are not up to the task means that new and original methods of protecting
CIs are needed.
3.5 Literature review conclusion
As observed by McLaughlin et al. (2014), it is only a matter of time that a malware, that can
defeat any security system, will be created. This statement might seem a bit extreme, but
considering the nature of current cyber-attacks and malware (see section 2.6) that have been
experienced, and the fact that cyber-attacks are designed for specific targets, it will not be
amiss to say that CIIs face very dangerous, knowledgeable and well-resourced attackers.
From the review analysis and synthesis, it was quite obvious that, although they are still
relevant, traditional security systems are no match for cyber-attacks launched with APTs. This
is because, as explained by Cardenas, Manadhata and Rajan (2013), traditional security
systems are based on traditional analytics which are based on limited storage, slow speed,
and a specific data type. Therefore, there is a need for a more dynamic system.
Also, in the era of cloud computing and BYOD, the traditional network perimeter does not
really exist since employees can connect to an organisation’s system from any remote
location. Thus, traditional systems, which are effectively designed for network perimeters and
are ‘prevention-centric’ are no longer effective. Consequently, ‘detection-centric’ systems
will be more suitable for the current cyber-attack climate (Brewer, 2015).
The undeniable theme that emerged from this review is that a dynamic system, that is based
on big data analytics, is the solution that can handle the current cyber-attacks threat
landscape. However, because big data analytics is still in its infancy, not a lot is known about
its power, especially in the way it has been used for cyber security. From the review, here is
what was determined about big data analytics for cyber security:
1. Although all the researchers reviewed extolled the “greatness” of big data analytics as a
tool for cyber security, there was not much evidence about the successes (or failures).
According to Everett (2015), this (lack of evidence) is attributed to the fact that
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 30
organisations are unwilling to share their experiences for fear that it might expose their
vulnerabilities. Also, not many organisations have the expertise and resources to utilise it.
2. Research about how big data analytics is or can be used to protect CIIs is limited. Entering
the key phrase, “using big data analytics for the detection of cyber-attacks in critical
information infrastructures”, in the relevant journal databases mentioned above yielded
almost no research paper as can be observed in Tables 1 and 2. Some of the solutions
presented for the protection of CIIs were in the form of models and framework, but there
was certainly no (standardised) model of meaningful practice about how big data analytics
can be deployed to protect CIIs.
3.5.1 Research gaps
From the literature review, one can determine that although big data analytics is seen as a
very powerful and relevant technology for the detection and prevention of cyber-attacks,
there is limited documentation of exactly how effective it is.
Also, there is no abstract model of meaningful practice about how big data analytics is used
for cyber-attack detection. Such a model might help new organisations that seek to employ
the technology, to avoid “trial and error” situations, which could prevent potential attacks
and might perhaps save them money.
3.5.2 Research question(s)
Given the research gaps outlined above, the main research questions for this study are:
• How effective is big data analytics when used for the detection and prevention of cyber-
attacks?
• In using a BDA technology, what is the overall strategy for deploying the system; is there
a specific methodology (or model) that is followed?
Other related questions are:
• What is the nature of threat landscape as far as CII organisations are concerned?
• What are the sources of data for the big data analytics technology being used?
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 31
4.0 Research Methodology
The selected methodology for this study is quantitative research. With respect to the nature
of the information to be obtained in this research, the data analysis will be conducted by
means of descriptive statistical techniques such as frequency tables; to facilitate this process,
statistical applications such as SPSS will be used. Details of this analysis approach is provided
in the results section (i.e. section 5).
In this section, the justification for selecting this particular methodology as well the data
collection process, including questionnaire design, are discussed.
4.1 Justification
Quantitative research consists of experimental designs as well as non-experimental designs
such as surveys, but in either case, the main objective of is to test or measure the impact of a
treatment, an intervention, a trend or an opinion (Creswell, 2014). The result will provide a
numeric description of the entity (i.e. treatment, trend or opinion) being measured, which
will make it easy to establish its effectiveness (or lack of it).
In this research, the main objective is to measure the effectiveness of big data analytics for
the detection and prevention of cyber-attacks, and the best way to establish this is to ask the
people (or organisations) that use them. Thus, the best methodology for collecting the
relevant data should be quantitative. To be specific, a survey by questionnaire will be
employed in this study.
A subsidiary objective of this study is to develop a model of best practice in the use of big data
analytics for cyber security. It is highly conceivable that organisations that achieve success will
deploy their (big data) technology in similar ways. Based on the similarities of their
operations, a model will be developed. Therefore, since a model (or an artefact) is to be
created, another research methodology considered was Design Research (DR). However, it
was decided that DR will not be suitable for the purposes of this research. This is because one
of the key aspects of the DR methodology is that, there must be a clear understanding of the
design problem. This research study is more exploratory in nature as one of the key parts is
to establish the effectiveness of big data analytics for cyber security. In other words, this
research is not intended to solve any known problem but rather to establish the success of an
operation and to describe a generic method (in the form of a model) of how it is done.
A key assumption of this research is that using big data analytics for the detection of cyber-
attacks (especially stealthy attacks) is very effective. Therefore, there is a chance that this
might not be case, and that this research will reveal a problem (in the method of how the big
data technology is deployed); should that be the case, future work or research could use DR
to solve that problem.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 32
4.2 Questionnaire design
In its basic form, quantitative research by survey is simply a matter of asking the relevant
target audience question(s) about the issue one’s research is about. For example, if a teacher
wants to judge if her lesson was fun, she can simply pose the question, “Did you find this
lesson fun?”, and count the number of “yes” responses. However, if she wants to judge the
effectiveness of her lecture, just asking such a direct question might not yield accurate results.
First, she may have to define what “effectiveness” actually means; is it about whether the
students found the lesson fun or whether they understood the main points of the lesson, or
both of these points. Then, she will have to frame the questions in a way that will allow her
to obtain the information she needs without bias.
This shows that, depending on the nature of the research, question design for a survey
(especially by questionnaire) can be a complex but very important process. The parameters
or variables – which may be defined as the questions or pieces of information that is collected
from the target audience (Open.edu.,2014) – of the main research question must be carefully
thought through and the wording of the questions must be clear, devoid of ambiguity and
must reflect the parameters. In other words, the questions one asks is as equally important
as how they are asked.
With regards to deciding what questions to ask, Leung (2001) explains that there are three
main types of information to be obtained:
• The first type is the main information one is seeking to obtain from the chosen target
audience. This is known as the dependent parameters.
• This is the information that might bring more meaning to the main information (i.e.
dependent variables). The second type is known as the independent parameters.
• The third type refers to external factors that might affect or distort the final results. These
are called confounding parameters.
In order to determine the parameters (or questions) of this research, the context for word
“effectiveness” in the research question (see section 3.5.2) was first established. Considering
the fact that the stealth attacks are the most difficult, if not impossible, to detect by
traditional methods, it was decided that (based on the nature of an APT and the speed of an
attack), an effective BDA system should have the following attributes:
• Able to detect stealth attacks significantly more often than not – at least 75% more often.
• The speed of detection must be within seconds, minutes and hours, but no more than a
day.
Having established the context by which an effective BDA system will be determined, an
outline of the dependent parameters, as well as independent parameters, for the
questionnaire with their corresponding justification are presented in Table 3. For the actual
questionnaire, including details of the options for each question, check Appendix i.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 33
No. Parameter Rationale
1 What is the size of your
company?
Given that currently, only large global companies
use BDA for cyber security, this question will serve
two purposes: confirm that BDA for cyber security is
indeed exclusive to large organizations; establish
the legitimacy of the respondent in terms whether
he/she has indeed used a BDA system for cyber
security.
2 What is your job role?
This question is also to establish the legitimacy of
the respondent. If they have no cyber security or
data science credentials, it is highly unlikely that
they will have any knowledge or experience about
BDA systems, given that the technology is still in its
infancy.
3 How long have you been
using a big data analytics
system for cyber security?
BDA technology is still in its infancy so it is highly
unlikely that anyone will have more than 5 years of
experience in its use.
4 Do you use your big data
analytics system together
with traditional cyber
security systems (such as
intrusion detection systems,
intrusion prevention systems,
anti-virus programs, firewalls,
etc.)?
The purpose of this question is to determine how
organisations deploy a BDA system for cyber
security.
5 Have you completely
replaced your traditional
cyber security systems with a
big data analytics system?
This question is a follow up on the previous one,
and it is to ensure that the respondent is taking the
survey seriously and not just selecting random
answers. For example, ‘yes’ and ‘yes’ responses for
questions 4 and 5 respectively signals that the
respondent might not be taking the survey seriously
because one cannot have the two situations
occurring at the same time.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 34
6 What type of big data
analytics processing do you
perform?
The purpose of this question is to determine the
nature of the BDA system that is being used for
cyber security.
7 Where do you source your
data from, for your security
analytics?
The purpose of this question is to determine how
organisations deploy a BDA system for cyber
security.
8 What type of big data
analytics technology do you
use?
The purpose of this question is to determine the
type of the BDA system that is being used for cyber
security. Perhaps a particular one of choice for the
range of organisations surveyed might indicate that
it is more effective that the others.
9 Which of these big data
storage systems or databases
do you use?
The type of storage system being used can affect
the speed of analytics process. Perhaps a particular
one of choice for the range of organisations
surveyed might indicate that it is more effective
that the others.
10 What percentage of the
cyber-attacks you have
experienced do consider as
targeted?
This is to establish the level of targeted attacks
against the organisation.
11 What percentage of these
targeted attacks do you
consider as stealth attacks
(e.g. advanced persistent
threats)?
Targeted attacks can be in the form of DDoS, spear
phishing, etc. so the purpose of this question is to
get an idea of how much of the targeted attacks are
stealthy.
12 Are you able to detect these
stealth attacks with your big
data analytics system?
This is to establish the effectiveness of the BDA
system.
13 About what percentage of
these stealth attacks are you
able to detect with your big
data analytics system?
This is to establish the effectiveness and efficiency
of the BDA system.
14 How long does it usually take
you to detect these stealth
attacks?
This is to establish the effectiveness of the BDA
system.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 35
15 Overall, using a big data
analytics system has
enhanced your ability to
detect stealth attacks.
This is to establish the effectiveness of the BDA
system.
16 Overall, using a big data
analytics system has made
your cyber security
operations more efficient.
This is to establish the effectiveness of the BDA
system.
Table 3 - parameters of the questionnaire with justification
4.2.1 Target audience and sample size
According to a Gartner report, only large global organisations can currently afford to use BDA
for cyber security (Laitan, 2014). Therefore, the target audience for this research is large
organisations whose operations are critical in nature to a country’s economy; specifically, the
questionnaire will be directed at cyber security professionals that work for these large
organizations.
Also, given that only an estimated 25 percent of these large global companies are likely to use
BDA, the sample size for this research is likely to be relatively small. However, it is important
that an accurate sample size is used and in order to do so, several methods can be used.
Bartlett, Kotrlik and Higgins (2001) discusses a range of these methods for determining the
sample size, but it seems the best approach involves the use of two main variables (or
statistics), namely, confidence interval (or margin of error) and confidence level.
The confidence interval is the acceptable range for which one’s estimated value (or result)
will be considered accurate (Open.edu.,2014). For example, if one sets the margin of error to
be 10% and the estimated value after the research is 80%, then the accuracy of this value will
be plus or minus 10%, which works out to be in the range, 70% to 90%.
The level of confidence ensures that your result falls within the confidence interval. A
common value that is often used is 95%, which means that there is a 5% chance that the result
will be outside the confidence interval (Open.edu.,2014).
Another statistic which is used in determining the sample size is the z-score, which is directly
proportional to the confidence interval. To put it simply, it gives an idea of the level of
deviation from the value determined (from the result). One can determine the z-score for a
given confidence interval value by looking at a standard chart.
The formula for calculating the sample size, that involves these two variables, is derived from
the formula for calculating the margin of error, as shown in figures 5a and 5b.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 36
ME = Z�p��1 − p��n
Figure 7 - mathematical equation for calculating margin of error (ME)
n = Z �P��1 − P���ME
Figure 8 - mathematical equation for calculating sample size n, derived from the ME equation
In both of these equations (shown in figures 5a and 5b), ME is the margin of error; Z is the z-
score; P is the estimate of the proportion of large global organisations that use a BDA system
for cyber security; and n is the sample size to be determined.
For this research, the ME is set to 10% (i.e. ME is 0.1) due to the fact that the number of large
global companies that use BDA for cyber security is not exactly known, and the figure given is
only a prediction (albeit from a reputable organisation). Also, although the questionnaire will
be directed at large global organisations, they will not be contacted directly, since they are
not known to this researcher. For this same reason, the confidence interval for this research
was chosen to be 90%. Looking at a z-score chart (http://www.stat.ufl.edu/, n.d), the z-score
for this value is 1.29.
As already mentioned earlier, it is projected that about 25 percent of large global companies
will adopt big data technologies for cyber security or fraud detection (Rivera, 2014). This
means that the value of P (for the equation shown in figure 5b) for our research is 0.25.
Applying these values (of ME, Z and P) to the equation in Figure 5b, the sample size n for this
research works out to be 31.
4.2.2 Data collection
After the questionnaire had been designed and created, it was tested with the relevant target
audience. To do this, the questionnaire was sent to 5 cybersecurity professionals and to
obtain their feedback, these cybersecurity professionals were asked to respond to a few but
directed questions about the questionnaire. Although three of them confessed that they were
not really familiar with the subject content (of big data analytics), they all agreed that the
questions were clear and easy to understand. Also, they all agreed that the use of closed
questions made the survey more user friendly and easy to complete.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 37
To collect the data (from the identified target audience), it was deployed on the online survey
service platform, where it was produced. This platform had a facility which allowed a specific
target audience to be selected. For the purposes of this research, the audience selected
included people that worked in the technology, telecommunications and Internet industries.
Also, different information security groups on the LinkedIn social media platform were
targeted. Finally, to ensure that every option (for obtaining feedback from the target
audience) was explored, emails were also sent to top security firms. The outcome and analysis
of the data collected are presented in “The Results” section (i.e. section 5).
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 38
5.0 The Results
Since the parameters of the questionnaire consists of groups of categories (e.g. less than a
year; between 1 to 3 years; between 3 to 5 years), the data collected can be summarised and
described by determining how many times a category occurs. This technique used for
analysing quantitative data called is frequency analysis.
Therefore, in this section, the data collected in the survey is analysed using frequency analysis.
This means that, for each question, the number of times a category occurs (in the responses)
is counted and calculated as a relative percentage. These are presented in a frequency table
as well as in pie chart. Also, the meaning and conclusions drawn from this analysis is
discussed; this includes the limitations and accuracy of the survey as well as the consequent
model of meaningful practice of using BDA for cyber security.
5.1 Analysis
For each question, the findings of the survey are analysed and its implications discussed.
Q1: What is the size of your company?
Answer choices No. of responses Percentage
1 – 4 employees 9 21%
5 – 9 employees 2 5%
10 – 19 employees 2 5%
20 – 99 employees 4 10%
100 – 499 employees 7 16%
500 – 599 employees 4 10%
1000 – 4999 employees 2 5%
5000 – 9999 employees 6 14%
10,000 + employees 7 16%
Total 43 100%
Table 4 - Frequency analysis for Q1
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 39
Chart 1: this chart shows the percentage distribution of the responses to question 1.
The rationale behind question 1 was to confirm that BDA for cyber security is indeed exclusive
to large organisations. Looking at the responses to this question (see Chart 1 and frequency
table, Table 4), it seems BDA is not exclusive to large organisations; only 35% (i.e. >500
employees) work for large organisations.
However, the outcome of question 2 reveals that about 35% (see Table 5 and Figure 9) of the
respondents are cyber security professionals. This might be indication that these cyber
security professionals are the same 35% that indicated that they work for a large organisation.
Therefore, BDA is indeed exclusive to large organisations.
Q2: What is your job role?
Answer choices No. of responses Percentage
Information security officer 2 5%
Information security analyst 5 12%
Information security manager 0 0%
Information security auditor 1 2%
Information security consultant 4 9%
IT/Network manager 1 2%
IT/Network administrator 0 0%
Chief information security officer (CISO) 0 0%
Chief information officer (CIO) 2 5%
Other (please specify) 28 65%
Total 14 100%
Table 5 - Frequency analysis for Q2
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 40
Chart 2: this chart shows the percentage distribution of the responses to question 2.
In Chart 2 and Table 6, one can observe that the percentage of respondents that chose “other”
is quite significant and it might be interesting to find out what a further investigation might
reveal; there is a chance that the background of these respondents (that chose “other”) might
suggest no knowledge or experience in using BDA systems for cyber security.
Q3: How long have you been using big data analytics system for cyber security?
Answer choices No. of responses Percentage
Less than a year 10 23%
Between 1 to 3 years 9 21%
Between 3 to 5 years 1 2%
More than 5 years 1 2%
Other (please specify) 22 51%
Total 43 100%
Table 6 - Frequency analysis for Q3
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 41
Chart 3: this chart shows the percentage distribution of the responses to question 3.
This chart and frequency table for Q3 (Chart 3 and Table 6) show that more than a third of
the respondents have been using BDA systems for 5 years or less. It seems the same
respondents that selected “other” for question 2, also selected “other” for this question. This
might indicate that these respondents have little or no knowledge of BDA technologies.
Q4: Do you use your big data analytics system together with traditional cyber security
systems (such as intrusion detection systems, intrusion prevention systems, anti-virus
programs, firewalls, etc.)?
Answer choices No. of responses Percentage
Yes 19 48%
No 21 52%
Total 40 100%
Table 7 - Frequency analysis for Q4
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 42
Chart 4: this chart shows the percentage distribution of the responses to question 4.
Comments about the responses of Q4 is combined with that of Q5 below, since the two
questions are connected.
Q5: Have you completely replaced your traditional cyber security systems with a big data
analytics system?
Answer choices No. of responses Percentage
Yes 5 12%
No 35 88%
Total 40 100%
Table 8 - Frequency analysis for Q5
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 43
Chart 5: this chart shows the percentage distribution of the responses to question 5.
Combining the responses for questions 4 and 5, one can observe (from Charts 4 and 5 as well
as Tables 7 and 8) that most organisations deploy their BDA cyber security systems in
conjunction with traditional cyber security systems. This indicates that, perhaps this is the
best way to do it. This makes sense because the more security event data one can analyse,
the more likely you are to detect inconsistencies and threats. It must be mentioned here that
the speed of the analytics is also essential.
Q6: What type of big data analytics processing do you perform?
Answer choices No. of responses Percentage
Batch 6 17%
Stream/Real-time 7 20%
Both (Batch and Stream) 22 63%
Total 35 100%
Table 9 - Frequency analysis for Q6
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 44
Chart 6: this chart shows the percentage distribution of the responses to question 6.
Comments about the responses of Q6 is combined with that of Q7 below, since the two
questions are connected.
Q7: Where do you source your data from, for your security analytics?
Answer choices No. of responses Percentage
Internally – from traditional systems such as SIEM 8 22%
Externally – from cyber security events on the Internet 4 11%
Both internal and external sources 24 67%
Total 36 100%
Table 10 - Frequency analysis for Q7
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 45
Chart 7: this chart shows the percentage distribution of the responses to question 7.
The responses for questions 6 and 7 (see Charts 6 & 7 as well as Tables 9 & 10) suggests that
the meaningful practice is to use data from both internal and external sources for performing
both batch and real-time analytics. In other words, both batch and real-time analytics are
necessary for detecting cyber-attacks, and also data should be obtained from any source, so
long as it is relevant.
Q8: What type of big data analytics technology do you use?
Answer choices No. of responses Percentage
S4 1 3%
Hamma 0 0%
Hadoop 8 22%
Storm 4 11%
Spark 7 19%
Other (please specify) 16 44%
Total 36 100%
Table 11 - Frequency analysis for Q8
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 46
Chart 8: this chart shows the percentage distribution of the responses to question 8.
Comments about the responses of Q8 is combined with that of Q9, since the two questions
are connected.
Q9: Which of these big data storage systems or databases do you use?
Answer choices No. of responses Percentage
CouchDB 4 0%
MongoDB 3 21.4%
HBase 8 42.9%
Cassandra 2 0%
Giraph 0 0%
CouchBase 0 0%
Riak 2 0%
Redis 0 0%
Neo4j 0 0%
OrientDB 1 0%
Other (please specify) 18 35.7%
Total 38 100%
Table 12 - Frequency analysis for Q9
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 47
Chart 9: this chart shows the percentage distribution of the responses to question 9.
The rationale behind questions 8 and 9 (see Charts 8 & 9 as well as Tables 11 & 12) was to
establish which technology is most widely used, and therefore most efficient and user
friendly. The results are inconclusive; perhaps these questions were not necessary since an
organisation’s choice of BDA technology will depend on its needs.
Q10: What percentage of the cyber-attacks you have experienced do you consider as
targeted?
Answer choices No. of responses Percentage
Less than 10% 18 42%
Between 10% and 20% 7 16%
Between 20% and 30% 3 7%
Between 30% and 40% 0 0%
Between 40% and 50% 2 5%
More than 50% 1 2%
Other (please specify) 12 28%
Total 43 100%
Table 13 - Frequency analysis for Q10
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 48
Chart 10: this chart shows the percentage distribution of the responses to question 10.
Looking at the chart and frequency table (Table 13) and chart (Chart 10) for question 10, it
seems different organisations experienced different levels of targeted attacks although the
majority experienced less than 10% of targeted attacks.
Q11: What percentage of these targeted attacks do you consider as stealth attacks (e.g.
advanced persistent threats)?
Answer choices No. of responses Percentage
Less than 10% 18 42%
Between 10% and 20% 5 12%
Between 20% and 30% 1 2%
Between 30% and 40% 1 2%
Between 40% and 50% 4 9%
More than 50% 2 5%
Other (please specify) 12 28%
Total 43 100%
Table 14 - Frequency analysis for Q11
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 49
Chart 11: this chart shows the percentage distribution of the responses to question 11.
The outcome of this analysis suggests that stealthy attacks may not be as common as some
might think.
Q12: Are you able to detect these stealth attacks with your big data analytics system?
Answer choices No. of responses Percentage
Yes 20 47%
No 23 43%
Total 43 100%
Table 15 - Frequency analysis for Q12
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 50
Chart 12: this chart shows the percentage distribution of the responses to question 12.
This analysis seems to suggest BDA systems are able to detect stealthy attacks only about 50%
(as shown in Chart 12 and Table 15).
Q13: About what percentage of these stealth attacks are you able to detect with your big
data analytics system?
Answer choices No. of responses Percentage
Less than 10% 10 26%
Between 10% and 20% 1 3%
Between 20% and 30% 2 5%
Between 30% and 40% 2 5%
Between 40% and 50% 1 3%
More than 50% 12 31%
Other (please specify) 11 28%
Total 39 100%
Table 16 - Frequency analysis for Q13
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 51
Chart 13: this chart shows the percentage distribution of the responses to question 13.
For a BDA system for cyber security to be effective, it was determined that should be able to
detect stealthy attacks at least 75% of the time (see section 4.2). This result seems to suggest
that the success rate is definitely more than 50%. With hindsight, this question should have
had the choice of “More than 75%”. Since “More than 75%” falls into the “More than 50%”
set, one can conclude that a BDA is effective.
Q14: How long does it usually take you to detect these stealth attacks?
Answer choices No. of responses Percentage
Seconds 7 16%
Minutes 13 30%
Hours 6 14%
Days 2 5%
Weeks 1 2%
Months 1 2%
Other (please specify) 13 30%
Total 43 100%
Table 17 - Frequency analysis for Q14
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 52
Chart 14: this chart shows the percentage distribution of the responses to question 14.
Another measure of effectiveness of a BDA system for cyber security was determined to be
that, it should be able to detect stealthy attacks within seconds, minutes and hours (see
section 4.2). This result (as shown in Chart 14 and Table 17) seems to suggest that it takes a
matter of minutes to detect stealth cyber-attacks using a BDA system. In other words, this
results suggests that a BDA system for cyber security is effective in detecting stealthy cyber-
attacks.
Q15: Overall, using a big data analytics system has enhanced your ability to detect stealth
attacks.
Answer choices No. of responses Percentage
Yes 23 53%
No 20 47%
Total 43 100%
Table 18 - Frequency analysis for Q15
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 53
Chart 15: this chart shows the percentage distribution of the responses to question 15.
Comments about the responses of Q15 is combined with that of Q16, since the two questions
are connected.
Q16: Overall, using a big data analytics system has made your cyber security operations
more efficient.
Answer choices No. of responses Percentage
Yes 21 49%
No 22 51%
Total 43 100%
Table 19 - Frequency analysis for Q16
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 54
Chart 16: this chart shows the percentage distribution of the responses to question 16.
The rationale behind questions 15 and 16 was to establish the effectiveness of BDA systems
for cyber security in a direct manner. The responses to these questions (as shown in Charts
21 & 22 as well as Tables 18 & 19) show a 50-50 split. In other words, 50% of the respondents
agree that a BDA system is effective in dealing with stealth attacks whilst the other 50%
disagreed. Perhaps, the reason for this split outcome is down to inexperience of use; or some
of the respondents were simply guessing since they do not have any knowledge of the
technology.
5.2 The model5.2 The model5.2 The model5.2 The model
The outcome of the survey, especially questions 4 and 5 show that BDA systems are always
used in conjunction with traditional security systems. Combining this outcome with
information from literature (see section 2.0), a model of how a BDA security system can be
deployed was developed; this is presented in Figure 9, below.
In this model, the concept of defence in depth (DiD) is applied whereby traditional
information security tools are used as the first level of defence. At this level, data generated
by using traditional security detection systems such as NIDS/NIDPS, Antivirus, firewalls, etc.
can be collated with a SIEM tool and used as a source of (internal) data for the BDA security
system which forms the second level of defence. Since the BDA security system is more
powerful in that it can crunch much more data and at a much faster rate than traditional
systems, it is important that the BDA system is placed closer to the data asset to be protected.
This way, if a stealth cyber-attack is missed by the first level of defence (i.e. SIEM tool) due to
lack of processing time (as attacks can occur in a matter of seconds) and relevant data, there
is a higher probability that BDA security system will be able detect it, since it has access to
more relevant data.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 55
In a nut shell, this model can be described as follows: common cyber-attacks, such as known
malware and general hack attacks, can be detected by the hardened traditional systems; if a
sophisticated and stealthy attack manages to bypass this first level of defence, that is the
traditional system, it will have to deal with the BDA security system, which is much more
formidable. In a worst case scenario where the attack still manages to breach the BDA security
system, this system should still be able to detect the attack before it becomes too late (a day
might be too late) and causes too much damage.
Figure 9 – a derived model showing how a BDA security system can be deployed
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 56
6.0 Conclusion
Targeted cyber-attacks against CIIs such as international financial systems and national
communication systems are on the rise, and a breach of such a system could be devastating
to a country’s economy or even trigger a war. The level of sophistication and stealthy nature
of these targeted attacks as well as the success rate (one breach is too many) demonstrate
that traditional cyber security systems are proving to be ineffective. Thus more innovative
solutions for the detection and prevention of stealthy and targeted cyber-attacks are needed.
This research explored one such solution, which is the use of BDA systems.
In the literature review, it was established that BDA systems are capable of meeting this
challenge, and perhaps they provide best solution for protecting CIIs. However, it was not
clear just how effective these BDA systems are (in detecting and preventing cyber-attacks),
since it is still early days for this technology. Also, the literature review did not reveal any
established method or strategy of using a BDA system for the detection and prevention of
cyber-attacks. Therefore, this study sought to answer these particular questions:
1. How effective are BDA systems for detecting and preventing cyber-attacks?
2. Is there a model of best practice for deploying a BDA system for cyber security?
With respect to the first question, the survey results were inconclusive with about 50% of the
respondents agreeing that the use of BDA systems for detecting and preventing cyber-attacks
is effective (as shown in the analysis of Q15 and Q16 in section 5) and the other 50%
disagreeing. Also, going by the analysis of Q13 and Q14 (see section 5), a BDA security system
is an effective tool for detecting stealthy attacks. Therefore, one can conclude that although
the jury is still out on the effectiveness of BDA security systems, compared to traditional
security systems, they are definitely an effective tool for detecting targeted and stealthy
cyber-attacks and must be used for the protection of CIIs.
For the second question, the research (literature review and survey) revealed that the
meaningful practice is to apply the concept of DiD by using traditional existing security
systems in conjunction with the BDA security system. In other words, the best defence
strategy for protecting CIIs is to have a BDA system that is capable of performing both batch
and real-time analytics. The idea is that, the data that is generated by users (authorised or
not) should be stored and combined with real-time and static data from external sources. This
large volume and diverse data can then be mined quickly (using a BDA system) for security
events. A model (see Figure 9) depicting this process is presented in section 5. This model
shows that the BDA system must be placed closest to the data item to be protected, followed
by the traditional security systems. The traditional security systems could also be organised
into multi-layers.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 57
6.1 6.1 6.1 6.1 ResearchResearchResearchResearch limitationslimitationslimitationslimitations
The main limitation of this study was a lack of resources in terms of finance, clout and time
during the data collection process. As already mentioned, the survey was created and
deployed with an online survey service. This online survey service provides a range of
different types of audiences based on their backgrounds. The target audience selected on this
platform was technology, networking, telecoms and media professionals, but this process
proved to be a rather challenging one with very little responses. The idea was to tap into the
audience database (for the general public) possessed by this online survey company, hoping
to attract only the people with the relevant knowledge (to complete the survey), but despite
the backgrounds of the respondents, their responses showed limited knowledge and
experience about big data analytics.
To get around this problem of limited responses and lack of relevant knowledge from the
respondents, specific security professional audiences were targeted. To do this, web links to
the survey were placed on information security specialised groups on a business oriented
social networking platform (i.e. LinkedIn). Also, emails (with the web links to the survey) were
sent to about 50 specific cyber security firms. With these two approaches, we managed to
collect data from more than the sample size.
Due to the nature of this research, which involves large organisations and technologies still in
its infancy, it was anticipated that such problems of lack of knowledge and limited responses
will occur. Therefore, although there is not much one can do to get more responses (especially
if one lacks resources), to ensure a certain level of accuracy, the questioning was designed in
a way that would inform the researcher as to whether responses should be trusted or not.
For example, looking at the responses for Q3 (see Table 8), since there is a substantive number
that have ‘other’ credentials, it was necessary to examine what their responses were and
upon examination, it turns out that some of them had a background that could indicate some
knowledge about BDA systems, but checking out the rest of the responses showed limited
knowledge of this field.
Hence, given these limitations, one can conclude that the results could have been of higher
accuracy and more conclusive if these targeted community groups of BDA knowledge and
Security were approached directly; that is getting them to complete the questionnaires by
interviewing them directly.
6.2 Future research
This research, at the very least, has laid out the theoretical foundation of how one can
measure the success rate, and thus the effectiveness using BDA for security. Also, a model (or
method) of meaningful practice for deploying BDA systems as part of a DiD strategy has been
produced. However, due to the limitations of the survey (see section 6.1), a future study could
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 58
be based on improving the data collection process, thus establishing unequivocally the
effectiveness of using BDA systems for cyber security.
Also in this research, it was determined that a range of BDA technologies have been and are
being developed for cyber security. A future research could take an in-depth look at these
technologies, and measure and rank their effectiveness as well as describe the nature of the
organisation they will be best suited for. In other words, a comparative study of the various
technologies can be conducted. This will make it easier for organisation to know what to do
if they wish to upgrade their security systems to BDA systems.
Recently, MIT scientists have developed an AI-enabled BDA system that is capable of
detecting (with some human help) cyber-attacks three times more effectively than current
systems (Shead, 2016). Another research could look more closely at this AI system in order to
determine how it is deployed, and whether it is cost effective. The study could seek to answer
questions such as: what kind of cyber-attacks does this system detect? Is it able to detect
stealthy attacks? What is the speed of detection? Is this the only system of its kind or are
there others? Also, another issue, related to this concept of AI, that can be explored is the
practicality and implications of fully automating cyber-security operations.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 59
7.0 References
1. Bartlett, J., Kotrlik, J. and Higgins, C. (2001). Organizational Research: Determining
Appropriate Sample Size in Survey Research. Information Technology, Learning, and
Performance Journal, 19(1).
2. Brewer, R. (2015). Cyber threats: reducing the time to detection and response. Network
Security, 2015(5), pp.5-8.
3. Cardenas, A., Manadhata, P. and Rajan, S. (2013). Big Data Analytics for Security. IEEE
Security & Privacy, 11(6), pp.74-76.
4. Cavelty, M. and Suter, M. (2012). The Art of CIIP Strategy: Tacking Stock of Content and
Processes. Centre for security studies, pp.27 - 36.
5. Cloud Security Alliance, (2013). Big Data Analytics for Security Intelligence. [online]
Cloud Security Alliance. Available at:
https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Big_Data_Analytics_for_S
ecurity_Intelligence.pdf [Accessed 24 Apr. 2016].
6. Creswell, J. (2014). Research design: Qualitative, quantitative, and mixed methods
approaches. 4th ed. California: Sage, pp.3 - 24, 155 - 182.
7. Dugan, K. (2014). Regulator sees cyber attacks on banks causing ‘Armageddon’. [online]
New York Post. Available at: http://nypost.com/2014/09/22/regulator-sees-cyber-
attacks-on-banks-causing-armageddon/ [Accessed 5 Mar. 2016].
8. Elgendy, N. and Elragal, A. (2014). Big Data Analytics: A Literature Review Paper.
Springer, 8557, pp.214 - 227.
9. Eschelbeck, G. (2014). Smarter, Shadier, Stealthier Malware. Security Threat Report.
[online] Sophos. Available at: https://www.sophos.com/en-
us/medialibrary/PDFs/other/sophos-security-threat-report-2014.pdf [Accessed 21 May
2016].
10. Everett, C. (2015). Big data – the future of cyber-security or its latest threat?. Computer
Fraud & Security, 2015(9), pp.14-17.
11. Hadoop.apache.org. (2016). Welcome to Apache™ Hadoop®!. [online] Available at:
http://hadoop.apache.org/ [Accessed 9 May 2016].
12. Han Hu, Yonggang Wen, Tat-Seng Chua, and Xuelong Li, (2014). Toward Scalable Systems
for Big Data Analytics: A Technology Tutorial. IEEE Access, 2, pp.652-687.
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 60
13. Hashem, I., Yaqoob, I., Anuar, N., Mokhtar, S., Gani, A. and Ullah Khan, S. (2015). The rise
of “big data” on cloud computing: Review and open research issues. Information
Systems, 47, pp.98-115.
14. http://www.stat.ufl.edu/. (n.d.). Standard Normal Probabilities. [online] Available at:
http://www.stat.ufl.edu/~athienit/Tables/Ztable.pdf [Accessed 19 Jun. 2016].
15. Hurst, W., Merabti, M. and Fergus, P. (2014). Big Data Analysis Techniques for Cyber-
threat Detection in Critical Infrastructures. 2014 28th International Conference on
Advanced Information Networking and Applications Workshops.
16. Information Security Forum, (2012). Data Analytics for Information Security: From
hindsight to insight. London: Information Security Forum Ltd, pp.1 - 3.
17. Internet Security Threat Report. (2016). ISTR. [online] California: Symantec. Available at:
https://www.symantec.com/content/dam/symantec/docs/reports/istr-21-2016-en.pdf
[Accessed 14 May 2016].
18. Krishnan, R. (2016). NSA Data Center Experiencing 300 Million Hacking Attempts Per
Day. [online] The Hacker News. Available at: http://thehackernews.com/2016/02/nsa-
utah-data-center.html [Accessed 1 Mar. 2016].
19. Lee, T. (2014). The Sony hack: how it happened, who is responsible, and what we've
learned. [online] Vox. Available at: http://www.vox.com/2014/12/14/7387945/sony-
hack-explained [Accessed 19 May 2016].
20. Leung, W. (2001). How to design a questionnaire. Student BMJ, [online] 9. Available at:
http://www.cochrane.es/files/Recursos/How_to_design_a_questionnaire.pdf [Accessed
5 Jun. 2016].
21. McLaughlin, K., Sezer, S., Smith, P., Ma, Z. and Skopik, F. (2014). PRECYSE: Cyber-attack
Detection and Response for Industrial Control Systems. [online] Available at:
http://precyse.eu/downloads/ [Accessed 7 Apr. 2016].
22. Mueller, R. (2012). Combating Threats in the Cyber World: Outsmarting Terrorists,
Hackers, and Spies. [online] FBI. Available at:
https://www.fbi.gov/news/speeches/combating-threats-in-the-cyber-world-
outsmarting-terrorists-hackers-and-spies [Accessed 21 May 2016].
23. Open.edu. (2014). [online] Available at:
http://www.open.edu/openlearnworks/mod/resource/view.php?id=52658 [Accessed 5
Jun. 2016].
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 61
24. Polak, K. (2016). Keeping European datacentres safe from cyber attacks. [online]
ComputerWeekly. Available at: http://www.computerweekly.com/feature/Keeping-
European-datacentres-safe-from-cyber-attacks [Accessed 1 Mar. 2016].
25. PRECYSE. (2012). [online] Available at: http://precyse.eu/overview/ [Accessed 7 Apr.
2016].
26. Rivera, J. (2014). By 2016, 25 Percent of Large Global Companies Will Have Adopted Big
Data Analytics For At Least One Security or Fraud Detection Use Case. [online]
Gartner.com. Available at: http://www.gartner.com/newsroom/id/2663015 [Accessed
23 Mar. 2016].
27. Rushton, K. (2014). Cyber-criminals could spark next financial crisis. [online]
Telegraph.co.uk. Available at:
http://www.telegraph.co.uk/finance/newsbysector/banksandfinance/11156260/Cyber-
criminals-could-spark-next-financial-crisis.html [Accessed 5 Mar. 2016].
28. Shackleford, D. (2016). Using Analytics to Predict Future Attacks and Breaches. [online]
SANS Institute. Available at:
http://www.sas.com/content/dam/SAS/en_us/doc/whitepaper2/sans-using-analytics-
to-predict-future-attacks-breaches-108130.pdf [Accessed 28 May 2016].
29. Shead, S. (2016). MIT scientists have built an AI that can detect 85% of cyber attacks —
but it still needs human help. [online] Business Insider. Available at:
http://uk.businessinsider.com/mit-scientists-build-ai-that-can-detect-85-of-cyber-
attacks-2016-4 [Accessed 27 Jul. 2016].
30. Silver-Greenberg, J., Goldstein, M. and Perlroth, N. (2016). JPMorgan Chase Hacking
Affects 76 Million Households. [online] DealBook. Available at:
http://dealbook.nytimes.com/2014/10/02/jpmorgan-discovers-further-cyber-security-
issues/?_php=true&_type=blogs&_r=1 [Accessed 5 Mar. 2016].
31. Skopik, F., Friedberg, I. and Fiedler, R. (2014). Dealing with advanced persistent threats
in smart grid ICT networks. ISGT 2014.
32. Storm.apache.org. (2015). Apache Storm. [online] Available at: http://storm.apache.org/
[Accessed 14 May 2016].
33. Sullivan, D. (2016). Introduction to big data security analytics in the enterprise. [online]
SearchSecurity. Available at: http://searchsecurity.techtarget.com/feature/Introduction-
to-big-data-security-analytics-in-the-enterprise [Accessed 4 Jun. 2016].
Tom Oseku-Afful/ MSc Information Security: Thesis
LTU/Department of Computer Science and Space Engineering 62
34. Tankard, C. (2012). Big data security. Network Security, 2012(7), pp.5-8.
35. Trend Micro, (2015). Report on Cybersecurity and Critical Infrastructure in the Americas.
[online] Trend Micro Inc. Available at: http://www.trendmicro.com/cloud-
content/us/pdfs/security-intelligence/reports/critical-infrastructures-west-
hemisphere.pdf [Accessed 23 Apr. 2016].
36. Tsai, C., Lai, C., Chao, H. and Vasilakos, A. (2015). Big data analytics: a survey. Journal of
Big Data, 2(1).
37. Ullveit-Moe, N., Gjosaeter, T., Assev, S., Koien, G. and Oleshchuk, V. (2013). Privacy
Handling for Critical Information Infrastructures. [online] Available at:
http://precyse.eu/downloads/ [Accessed 7 Apr. 2016].
38. van Kessel, P. and Allan, K. (2014). Get ahead of cybercrime. EY's Global Information
Security Survey. [online] EYGM. Available at:
http://www.ey.com/Publication/vwLUAssets/EY-global-information-security-survey-
2014/$FILE/EY-global-information-security-survey-2014.pdf [Accessed 19 May 2016].
39. Virvilis, N. and Gritzalis, D. (2013). The Big Four - What We Did Wrong in Advanced
Persistent Threat Detection?. 2013 International Conference on Availability, Reliability
and Security.
40. Vom Brocke, J., Simons, A., Niehaves, B., Riemer, K., Plattfaut, R. and Cleven, A. (2009).
RECONSTRUCTING THE GIANT: ON THE IMPORTANCE OF RIGOUR IN DOCUMENTING THE
LITERATURE SEARCH PROCESS. ECIS, 161, pp.1 - 14.
41. Williams, R. (2016). BT broadband suffers major outage across UK. [online] The
Telegraph. Available at: http://www.telegraph.co.uk/technology/2016/02/02/bt-
broadband-suffers-major-outage-across-uk/ [Accessed 16 Apr. 2016].
42. Zetter, K. (2016). Sony Got Hacked Hard: What We Know and Don’t Know So Far. [online]
WIRED. Available at: http://www.wired.com/2014/12/sony-hack-what-we-know/
[Accessed 16 Apr. 2016].