Cybersecurity and the role of artificial intelligence€¢Linear treshold unit (LTU) x 1 x 2 x n... w 1 w 2 w n w 0 x 0=1 i=0 n w i x i 1 if i=0 n w i x i >0 o(x i)=-1 otherwise o

Cybersecurity and the role of artificial intelligence

Arlindo Oliveira

Outline

• What is Artificial Intelligence?– Good old fashioned artificial intelligenge

– Machine learning

– Deep learning

• Implications of AI in privacy– Identification from big data is very easy

• Applications of machine learning in security– Intrusion detection

– Website and email classification

The many springs and falls of Artificial Intelligence

• Initial enthusiasm, in the 50’sIn 1958 the New York Times optimistically called the perceptron “the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence”

• Realization that true AI is very hard and the fall of the perceptron in 1969, with the book “Perceptrons”

• Development of Multi-Layer Perceptrons (MLP) (mid 80’s)

• Realization that MLPs are very hard to train (early 90’s)

• Appearance of Support Vector Machines (mid 90’s)

• Deep learning (10’s)

Good Old Fashioned Artificial Intelligence (GOFAI)

• Artificial Intelligence research started in the 50´s, more than 60 years ago.

• Alan Turing started the field with an influential paper• Other researchers followed soon• Most approaches tried to develop formal models of human

behavior:– Knowledge representation– Planning and scheduling– Grammars and language rules– Search and optimization– Theorem proving– Game playing– Rule based systems for vision and perception

Good Old Fashioned Artificial Intelligence (GOFAI) (continued)

• Many results were obtained:– Efficient search methods, using heuristics and

pruning techniques

– New languages and paradigms for AI related problems (LISP, Prolog)

• But:– Approach is insufficiently flexible

– Real world is too complex to be formalized

– Impossible to hand-code all combinations

To the rescue, an idea by Turing: Machine Learning

Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain. Presumably the child brain is something like a notebook as one buys it from the stationer’s. Rather little mechanism, and lots of blank sheets. Our hope is that there is so little mechanism in the child brain that something like it can be easily programmed.(Turing 1950)

Machine Learning

• Use datasets to learn rules

• Each instance is defined by a set of attributes

• Supervised learning: – each instance has a label, and the system learns to

map from attributes to the label

• Unsupervised learning:– The system aggregates events which are similar,

and learns to attribute the label

• Semi-supervised learning

Sample data set

To which class belong the yellow circles (black or red)?

Machine Learning Techniques

• Regression• Distance based learning• Probabilistic: Naïve Bayes, Bayes networks• Decision trees• Neural networks• Support vector machines• Clustering• Association rules• Boosting and bagging

Distance based learning (Nearest Neighbor) decision

Example of decision tree for attempted security attack

Origin

Germany China Russia

Port

465 25

Destination

Germany USA

No Yes

Yes

YesNo

Decision tree class boundaries

Algorithms that search for the right tree have been developed

+ - +

+ - +

A1

- - ++ - +

A2

+ - -

+ - +

A2

-

A4+ -

A2

-

A3- +

Single perceptron

• Linear treshold unit (LTU)

x1

x2

xn

.

.

.

w1

w2

wn

w0

x0=1

i=0n wi xi

1 if i=0n wi xi >0

o(xi)= -1 otherwise

o

{

Decision Surface of a Perceptron

+

++

+ -

-

-

-x1

x2

+

+-

-

x1

x2

• Perceptron is able to represent some useful functions• And(x1,x2) choose weights w0=-1.5, w1=1, w2=1• But functions that are not linearly separable (e.g. Xor)

are not representable

Perceptron class boundary

Sigmoid Unit

x1

x2

xn

.

.

.

w1

w2

wn

w0

x0=1

net=i=0n wi xi

o

o=(net)=1/(1+e-net)

Multi-Layer Networks

input layer

hidden layer

output layer

Multi-layer perceptron class boundary

Multi-layer perceptron class boundary

What is different about AI today?

• Deep learning puts together– High-performance computers

– Very large data sets

– Lots of accumulated knowledge about AI

• Delivers– Ability to work with real world data

– Handles vision, natural language, high-dimensional data

– Impact in economically very relevant applications

Implications of Artificial Intelligence in Privacy

• Data mining is being applied in an increasingly large number of domains

• Data streams can be easily integrated by machine learning algorithms

– Mobile device data

– Electronic trail data

– Video and photography data

– Satellite data

Ensuring privacy is no longer easy, and may become impossible in the near future

• Digital trail is simply too conspicuous

• Integration of different data streams became possible

• Instead of demanding privacy, which is impossible, one should demand:

– Warranties of adequate use of inferred data

– Mechanisms that impose heavy costs on wrong uses

Why use AI in cybersecurity?

• AI is now considered crucial to the role of cyber security

• Protect sites from attacks and identify quickly new threats

• It can speed up the process of noticing attacks

• Beneficial in preventing a full-scale breach being unleashed.

• Many startups in this area, including, at least, one from Portugal

Applications of Machine Learning in Security

• Intrusion detection• Examples

– Malicious JavaScript and other scripts– Malicious Non-Executable Files– Malicious Executable Files

• Inappropriate Web and Email Content • Phishing

– Derive probabilistic models of phishing attacks– Derive probabilistic reputation models for URLs

Difficulties in dealing with intrusions

• Attackers tries to hide their actions from either an individual who is monitoring the system, or an Intrusion Detection System (IDS)

– cover their tracks by editing system logs

– reset a modification date on a file that they replaced modified

• Never seen before intrusions

– Undetectable by signature based IDS

– Can be detected as anomalies by observing significant deviations from the normal network behavior

Signature based vs. Anomaly detection

• Signature based IDSs are based on existing knowledge of patterns

associated with known attacks, provided by human experts

– Existing approaches: pattern (signature) matching, expert systems, state

transition analysis

– Unable to detect novel & unanticipated attacks

– Signature database has to be revised for each new type of discovered attack

• Anomaly detection is based on finding matchings to profiles that

represent the normal behavior of users, hosts, or networks, and detecting

attacks as significant deviations from this profile

– Potentially able to recognize unforeseen attacks.

– Possible high false alarm rate, since detected deviations do not necessarily

represent actual attacks

– Major approaches: statistical methods, expert systems, clustering, neural

networks, support vector machines

Using Data Mining for Intrusion Detection

Predictive models are built from labeled data sets (instances are labeled as “normal” or “intrusive”)

These models can be more sophisticated and precise than manually created signatures

Build models of “normal” behavior and detect anomalies as deviations from it

Unable to detect attacks whose instances have not yet been observed

Possible high false alarm rate (False Positives) -previously unseen (yet legitimate) system behaviors may be recognized as anomalies

Possible False Negatives - missed attacks

Detection Rate vs False Alarm Rate

• ROC Curves represent the trade-off between detection rate and false alarm rate

• Plot of detection rate vs. false alarm rates

• Ideal system should have 100% detection rate with 0% false alarms

Standard metrics for evaluations of intrusions (attacks)

Predicted connection label Standard metrics

Normal Intrusions (Attacks)

Normal True Negative (TN) False Alarm (FP) Actual

connection label Intrusions (Attacks) False Negative (FN) Correctly detected intrusions

- Detection rate (TP)

0 0.02 0.04 0.06 0.08 0.10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

LOF approachNN approachMahalanobis approachUnsupervised SVM

ROC Curves for different outlier detection techniques

De

tectio

n R

ate

False Alarm Rate

intrusion detection techniques

Different IDS

techniques

Machine learning techniques used in intrusion detection

• Supervised Learning– Support Vector Machines for classification and anomaly detection– Some use of Neural Networks– Various probabilistic models such as Naïve Bayes variations

• Unsupervised Learning– HMMs and more complex variations thereof– Various clustering algorithms, MoG, KNNs– Dimensionality Reduction Algorithms (KPCA)

• Other– Adaboost, mixtures of experts

• Disclaimer– Not all are used in end products, and unfortunately we cannot say

which techniques are used in which applications

Example: Malicious JavaScript

• Normal document classification works on the presence of “words” in files

• It’s also possible to encapsulate other information in models– E.g. Naïve Bayes classifiers for email use pseudo words like

“sender-tld:info”, “sender-tld:com” and “address-known:false”, “address-known:true” to improve accuracy

• Similar methods can be used with JavaScript• Extract words (though not all words) and other

features of interest• Feed these to a machine learning model

Example: Malicious JavaScript

• Complications arise due to the extreme use of obfuscation techniques by attackers– And also legitimate vendors (e.g. Google)– And by large Web 2.0 libraries

v46f658f5e2260(v46f658f5e3226){ function v46f658f5e4207 () {return 16;}

return(parseInt(v46f658f5e3226,v46f658f5e4207()));}function

v46f658f5e61f4(v46f658f5e7174){ function v46f658f5ea0cd () {return 2;} var

v46f658f5e813e=\'\';for(v46f658f5e9105=0;

v46f658f5e9105<v46f658f5e7174.length; v46f658f5e9105+=v46f658f5ea0cd()){

v46f658f5e813e+=(String.fromCharCode(v46f658f5e2260(v46f658f5e7174.substr(v46

f658f5e9105, v46f658f5ea0cd()))));}return v46f658f5e813e;}

document.write(v46f658f5e61f4(\'3C5343524950543E77696E646F772E7374617475733D2

\'));

• The above is JavaScript, but where are the features?

Challenges for ML based intrusion detection

Large data size– E.g. Millions of network connections

are common for commercial network sites, …

High dimensionality– Hundreds of dimensions are possible

Temporal nature of the data– Data points close in time - highly correlated

Skewed class distribution– Interesting events are very rare looking for the “needle in a haystack”

Data Preprocessing– Converting data from monitored system into data appropriate for analysis

High Performance Computing (HPC) is critical for on-line, scalable and distributed intrusion detection

“Mining needle in a haystack.

So much hay and so little time”

Inappropriate Content Detection

• Basically just document classification• Want to stop bad sites and/or messages by

content – Porn, hate, ...• Good classifiers naïve Bayes – Multinomial

Bernouli → Multinomial mixture model • These have problems, in practice add IR

techniques such as TF/IDF• SVM approaches better.• Also topic based – LSA / LDA

Identifying phishing attempts

• Also a form of document classification

• A-priori probabilities for given URLs can help in classifying phishing attempts

• Technique:• Naïve Bayes

• SVMs and other supervised classifiers

• Mail filters can be trained on-line, using semi-supervised approaches

Any questions?

Documents

Cybersecurity and the role of artificial intelligence€¢Linear treshold unit (LTU) x 1 x 2 x n... w 1 w 2 w n w 0 x 0=1 i=0 n w i x i 1 if i=0 n w i x i >0 o(x i)=-1 otherwise o