Upload
ngokhanh
View
229
Download
7
Embed Size (px)
Citation preview
Cybersecurity and the role of artificial intelligence
Arlindo Oliveira
Outline
• What is Artificial Intelligence?– Good old fashioned artificial intelligenge
– Machine learning
– Deep learning
• Implications of AI in privacy– Identification from big data is very easy
• Applications of machine learning in security– Intrusion detection
– Website and email classification
The many springs and falls of Artificial Intelligence
• Initial enthusiasm, in the 50’sIn 1958 the New York Times optimistically called the perceptron “the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence”
• Realization that true AI is very hard and the fall of the perceptron in 1969, with the book “Perceptrons”
• Development of Multi-Layer Perceptrons (MLP) (mid 80’s)
• Realization that MLPs are very hard to train (early 90’s)
• Appearance of Support Vector Machines (mid 90’s)
• Deep learning (10’s)
Good Old Fashioned Artificial Intelligence (GOFAI)
• Artificial Intelligence research started in the 50´s, more than 60 years ago.
• Alan Turing started the field with an influential paper• Other researchers followed soon• Most approaches tried to develop formal models of human
behavior:– Knowledge representation– Planning and scheduling– Grammars and language rules– Search and optimization– Theorem proving– Game playing– Rule based systems for vision and perception
Good Old Fashioned Artificial Intelligence (GOFAI) (continued)
• Many results were obtained:– Efficient search methods, using heuristics and
pruning techniques
– New languages and paradigms for AI related problems (LISP, Prolog)
• But:– Approach is insufficiently flexible
– Real world is too complex to be formalized
– Impossible to hand-code all combinations
To the rescue, an idea by Turing: Machine Learning
Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain. Presumably the child brain is something like a notebook as one buys it from the stationer’s. Rather little mechanism, and lots of blank sheets. Our hope is that there is so little mechanism in the child brain that something like it can be easily programmed.(Turing 1950)
Machine Learning
• Use datasets to learn rules
• Each instance is defined by a set of attributes
• Supervised learning: – each instance has a label, and the system learns to
map from attributes to the label
• Unsupervised learning:– The system aggregates events which are similar,
and learns to attribute the label
• Semi-supervised learning
Sample data set
To which class belong the yellow circles (black or red)?
Machine Learning Techniques
• Regression• Distance based learning• Probabilistic: Naïve Bayes, Bayes networks• Decision trees• Neural networks• Support vector machines• Clustering• Association rules• Boosting and bagging
Distance based learning (Nearest Neighbor) decision
Example of decision tree for attempted security attack
Origin
Germany China Russia
Port
465 25
Destination
Germany USA
No Yes
Yes
YesNo
Decision tree class boundaries
Algorithms that search for the right tree have been developed
+ - +
+ - +
A1
- - ++ - +
A2
+ - -
+ - +
A2
-
A4+ -
A2
-
A3- +
Single perceptron
• Linear treshold unit (LTU)
x1
x2
xn
.
.
.
w1
w2
wn
w0
x0=1
i=0n wi xi
1 if i=0n wi xi >0
o(xi)= -1 otherwise
o
{
Decision Surface of a Perceptron
+
++
+ -
-
-
-x1
x2
+
+-
-
x1
x2
• Perceptron is able to represent some useful functions• And(x1,x2) choose weights w0=-1.5, w1=1, w2=1• But functions that are not linearly separable (e.g. Xor)
are not representable
Perceptron class boundary
Sigmoid Unit
x1
x2
xn
.
.
.
w1
w2
wn
w0
x0=1
net=i=0n wi xi
o
o=(net)=1/(1+e-net)
Multi-Layer Networks
input layer
hidden layer
output layer
Multi-layer perceptron class boundary
Multi-layer perceptron class boundary
What is different about AI today?
• Deep learning puts together– High-performance computers
– Very large data sets
– Lots of accumulated knowledge about AI
• Delivers– Ability to work with real world data
– Handles vision, natural language, high-dimensional data
– Impact in economically very relevant applications
Implications of Artificial Intelligence in Privacy
• Data mining is being applied in an increasingly large number of domains
• Data streams can be easily integrated by machine learning algorithms
– Mobile device data
– Electronic trail data
– Video and photography data
– Satellite data
Ensuring privacy is no longer easy, and may become impossible in the near future
• Digital trail is simply too conspicuous
• Integration of different data streams became possible
• Instead of demanding privacy, which is impossible, one should demand:
– Warranties of adequate use of inferred data
– Mechanisms that impose heavy costs on wrong uses
Why use AI in cybersecurity?
• AI is now considered crucial to the role of cyber security
• Protect sites from attacks and identify quickly new threats
• It can speed up the process of noticing attacks
• Beneficial in preventing a full-scale breach being unleashed.
• Many startups in this area, including, at least, one from Portugal
Applications of Machine Learning in Security
• Intrusion detection• Examples
– Malicious JavaScript and other scripts– Malicious Non-Executable Files– Malicious Executable Files
• Inappropriate Web and Email Content • Phishing
– Derive probabilistic models of phishing attacks– Derive probabilistic reputation models for URLs
Difficulties in dealing with intrusions
• Attackers tries to hide their actions from either an individual who is monitoring the system, or an Intrusion Detection System (IDS)
– cover their tracks by editing system logs
– reset a modification date on a file that they replaced modified
• Never seen before intrusions
– Undetectable by signature based IDS
– Can be detected as anomalies by observing significant deviations from the normal network behavior
Signature based vs. Anomaly detection
• Signature based IDSs are based on existing knowledge of patterns
associated with known attacks, provided by human experts
– Existing approaches: pattern (signature) matching, expert systems, state
transition analysis
– Unable to detect novel & unanticipated attacks
– Signature database has to be revised for each new type of discovered attack
• Anomaly detection is based on finding matchings to profiles that
represent the normal behavior of users, hosts, or networks, and detecting
attacks as significant deviations from this profile
– Potentially able to recognize unforeseen attacks.
– Possible high false alarm rate, since detected deviations do not necessarily
represent actual attacks
– Major approaches: statistical methods, expert systems, clustering, neural
networks, support vector machines
Using Data Mining for Intrusion Detection
Predictive models are built from labeled data sets (instances are labeled as “normal” or “intrusive”)
These models can be more sophisticated and precise than manually created signatures
Build models of “normal” behavior and detect anomalies as deviations from it
Unable to detect attacks whose instances have not yet been observed
Possible high false alarm rate (False Positives) -previously unseen (yet legitimate) system behaviors may be recognized as anomalies
Possible False Negatives - missed attacks
Detection Rate vs False Alarm Rate
• ROC Curves represent the trade-off between detection rate and false alarm rate
• Plot of detection rate vs. false alarm rates
• Ideal system should have 100% detection rate with 0% false alarms
Standard metrics for evaluations of intrusions (attacks)
Predicted connection label Standard metrics
Normal Intrusions (Attacks)
Normal True Negative (TN) False Alarm (FP) Actual
connection label Intrusions (Attacks) False Negative (FN) Correctly detected intrusions
- Detection rate (TP)
0 0.02 0.04 0.06 0.08 0.10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LOF approachNN approachMahalanobis approachUnsupervised SVM
ROC Curves for different outlier detection techniques
De
tectio
n R
ate
False Alarm Rate
intrusion detection techniques
Different IDS
techniques
Machine learning techniques used in intrusion detection
• Supervised Learning– Support Vector Machines for classification and anomaly detection– Some use of Neural Networks– Various probabilistic models such as Naïve Bayes variations
• Unsupervised Learning– HMMs and more complex variations thereof– Various clustering algorithms, MoG, KNNs– Dimensionality Reduction Algorithms (KPCA)
• Other– Adaboost, mixtures of experts
• Disclaimer– Not all are used in end products, and unfortunately we cannot say
which techniques are used in which applications
Example: Malicious JavaScript
• Normal document classification works on the presence of “words” in files
• It’s also possible to encapsulate other information in models– E.g. Naïve Bayes classifiers for email use pseudo words like
“sender-tld:info”, “sender-tld:com” and “address-known:false”, “address-known:true” to improve accuracy
• Similar methods can be used with JavaScript• Extract words (though not all words) and other
features of interest• Feed these to a machine learning model
Example: Malicious JavaScript
• Complications arise due to the extreme use of obfuscation techniques by attackers– And also legitimate vendors (e.g. Google)– And by large Web 2.0 libraries
v46f658f5e2260(v46f658f5e3226){ function v46f658f5e4207 () {return 16;}
return(parseInt(v46f658f5e3226,v46f658f5e4207()));}function
v46f658f5e61f4(v46f658f5e7174){ function v46f658f5ea0cd () {return 2;} var
v46f658f5e813e=\'\';for(v46f658f5e9105=0;
v46f658f5e9105<v46f658f5e7174.length; v46f658f5e9105+=v46f658f5ea0cd()){
v46f658f5e813e+=(String.fromCharCode(v46f658f5e2260(v46f658f5e7174.substr(v46
f658f5e9105, v46f658f5ea0cd()))));}return v46f658f5e813e;}
document.write(v46f658f5e61f4(\'3C5343524950543E77696E646F772E7374617475733D2
\'));
• The above is JavaScript, but where are the features?
Challenges for ML based intrusion detection
Large data size– E.g. Millions of network connections
are common for commercial network sites, …
High dimensionality– Hundreds of dimensions are possible
Temporal nature of the data– Data points close in time - highly correlated
Skewed class distribution– Interesting events are very rare looking for the “needle in a haystack”
Data Preprocessing– Converting data from monitored system into data appropriate for analysis
High Performance Computing (HPC) is critical for on-line, scalable and distributed intrusion detection
“Mining needle in a haystack.
So much hay and so little time”
Inappropriate Content Detection
• Basically just document classification• Want to stop bad sites and/or messages by
content – Porn, hate, ...• Good classifiers naïve Bayes – Multinomial
Bernouli → Multinomial mixture model • These have problems, in practice add IR
techniques such as TF/IDF• SVM approaches better.• Also topic based – LSA / LDA
Identifying phishing attempts
• Also a form of document classification
• A-priori probabilities for given URLs can help in classifying phishing attempts
• Technique:• Naïve Bayes
• SVMs and other supervised classifiers
• Mail filters can be trained on-line, using semi-supervised approaches
Any questions?