Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in...

Preview:

Citation preview

Copyright Cognomotiv 2016

Machine Learning

No: It Can’t Do That!

Hadi Nahari

hadi@cognomotiv.com

hadinahari

Copyright Cognomotiv 2016

“Friends, Romans, countrymen, lend me your ears;

I come to bury Caesar, not to praise him.

The evil that men do lives after them…”

Julius Caesar

Act 3, Scene II

Copyright Cognomotiv 2016

Setup

• ML + NetSec

Copyright Cognomotiv 2016

National Academy of EngineeringGrand Challenges for 21st Century

"The best minds of my generation are thinking about how to make people click ads.” ---Jeff Hammerbacher

Copyright Cognomotiv 2016

Agenda

• Motivations

• Machine Learning 101

• ML & Network Security

• What Works, What Doesn’t

• Conclusion

5 / 50

Copyright Cognomotiv 2016

MOTIVATIONSAgenda

Copyright Cognomotiv 2016

ML Is NOT New

• This is the 5th round…

Copyright Cognomotiv 2016

ML is HOT!!

• VCs fund ML-companies like crazy

• Amazing new fields have opened

– Autonomous driving, behavior analytics, etc.

• Ton of existing fields have been revived

– Search, personalization/customization, audio processing, image processing, etc., etc.

Copyright Cognomotiv 2016

• Mainly because…

Copyright Cognomotiv 2016

Code Complexity

• Space Shuttle: ~400K LOC

• F22 Raptor fighter: ~2M LOC

• Linux kernel 2.2: ~2.5M LOC

• Hubble telescope: ~3M LOC

• Android core: ~12M LOC

• Future Combat Sys.: ~63M LOC

• Connected car: ~100M LOC

• Autonomous vehicle: ~300M LOC

10 / 50

Copyright Cognomotiv 2016

• Autonomous vehicle: ~300M LOC

Large Hadron Collider: 60 M LOC

50 M LOC

Copyright Cognomotiv 2016

Usecase Complexityservice provider

on avg. only five passwords per 40 online accounts per user

where to store the tokens???

Copyright Cognomotiv 2016

Data Procreation

• >2 billion GB of new data is created every day– 2.3283006436538696 B GB to be exact

• Sparse data: mainly 0s

• In ‘93 the information on the internet surpassed all information that humanity had created before it

Copyright Cognomotiv 2016

Stack Proliferation

HW Architecture(s)

Applications

Copyright Cognomotiv 2016

Algorithms

15 / 50

Copyright Cognomotiv 2016

Algorithms

Copyright Cognomotiv 2016

ML 101Agenda

Copyright Cognomotiv 2016

Machine Learning (ML)• Study of pattern recognition & computational

learning theory in Artificial Intelligence (AI)

• Algorithms to learn from, and make predictions on data

• As opposed to following strictly static program instructions

Copyright Cognomotiv 2016

ML Models• Supervised learning

• Unsupervised learning

• (Semi-supervised learning)

• Reinforcement learning

Copyright Cognomotiv 2016

Supervised Learning

20

• {(labeled) Input} [map] {Expected Output}

• Find [map]

/ 50

Copyright Cognomotiv 2016

Supervised Learning Model

Copyright Cognomotiv 2016

Unsupervised Learning• {(unlabled) Input} [map] {Output}

• Find structure (patterns) in {Input}

Copyright Cognomotiv 2016

Unsupervised Learning Model

Copyright Cognomotiv 2016

Reinforcement Learning• No correct {Input}/{Output}

• Action, environment, reward

Copyright Cognomotiv 2016

Reinforcement Learning Model

25 / 50

Copyright Cognomotiv 2016

Main ML Approaches• Decision Tree Learning, Association Rule Learning

• Inductive Logic Programming, Support Vector Machines, Clustering, Bayesian Networks

• Representation Learning, Genetic Algorithms

• Similarity and Metric Learning, Sparse Dictionary Learning

• Artificial Neural Networks (ANN), Deep Learning (DL)

Copyright Cognomotiv 2016

Neural Network

• Interpret an Artificial Intelligence (AI) task as the evaluation of complex functions

– Facial Recognition: Map a bunch of pixels to a name

– Handwriting Recognition: Image to a character

• NN: Network of interconnected simple neurons

Copyright Cognomotiv 2016

The NeuronFeed-forward system, made up of two stages:

Linear Transformation of data

Point-wise application of non-linear function

X

1

X

2

X

3

W

1

W

2

W

3

yi =F(ΣWiXi)i

F(x) =max(0,x)

(also sigmoid, Rectified Linear Unit (ReLU), etc.)

Copyright Cognomotiv 2016

Artificial Neural Network (ANN)• Layers and layers of neurons, with many

connections

Input:

Output:

Copyright Cognomotiv 2016

Deep Learning (DL)

30

• Branch of ML based on a set of algorithms that:

• Attempt to model high-level data abstractions

• Are based on learning representations of data

• Use complex architectures with multiple non-linear transformations

• Some representations make it easier to learn tasks from examples (e.g. Alpha Go)

/ 50

Copyright Cognomotiv 2016

DNN: Learning Feature Representation

Input Result

Copyright Cognomotiv 2016

DNN: Feature Engineering

Anything humans can do in 0.1 sec, the right, big 10-layer network can do too

Image Vision features Detection

Images/video

Audio Audio features Speaker ID

Audio

Text

Text Text features

Text classification, Machine translation, Information retrieval, ....

Copyright Cognomotiv 2016

ML/DL Improve With Scale

Data & Compute

Performance ML / DL

Many previous methods

Past Present Future

Copyright Cognomotiv 2016

ML & NETSECAgenda

Copyright Cognomotiv 2016

Intrusion & Intrusion Detection

35

“Intrusion is an attempt to compromise CIA

(Confidentiality, Integrity, Availability), or to bypass the

security mechanisms of a computer or network”

“Intrusion detection is the process of monitoring the

events occurring in a computer system or network, and

analyzing them for signs of an intrusion”

/ 50

Copyright Cognomotiv 2016

3 Main Detection Methodologies• Signature-based Detection (SD)

• Signature: pattern corresponding to known attack or threat

• SD: process to compare patterns against captured events

• A.K.A “Knowledge-based Detection”

• Anomaly-based Detection (AD)

• Anomaly is a deviation to “normal” behavior

• Profile of normal is derived from monitoring network traffic

• AD compares normal profile with observed events

• Stateful Protocol Analysis (SPA)

• Vendor-developed generic profiles to specific protocols

Copyright Cognomotiv 2016

Cybersecurity System

• Attacks evolve, ergo building defense systems is nontrivial

• Thus, higher-level & adaptive methodologies are required

Copyright Cognomotiv 2016

Adaptive Cybersecurity

• Data-capturing tools (Libpcap, Winpcap, etc.) capture events from the audit trails of information sources (e.g. network)

• Data-preprocessing module filters out the attacks from which good signatures have been learned

• A feature-extractor derives basic features (sequence of syscalls, start time, NetFlow duration, src/dest IP/port, protocol, byte and packet counts

• Analysis engine implements detection methods for infrastructure anomalies, which may or may not have appeared before

Copyright Cognomotiv 2016

WHAT WORKS WHAT DOESN’T Agenda

Copyright Cognomotiv 2016

Curse of Dimensionality

40

• Data volume is massive

– min. ~100M events per day

• Much of the data is streaming data

– Requires inline, real-time analysis

• Feature space is high dimensional

/ 50

Copyright Cognomotiv 2016

$/Detection Performance Abysmal

• Looking for “every anomaly” is cost prohibitive

– if at all [practically] possible

• Narrowing down the criteria too much

– results in false negative

• Reference data hard to gain due to privacy concerns

– Simulated data is useless

• ML was supposed to be better than signature era

Copyright Cognomotiv 2016

Husky Recognition

Copyright Cognomotiv 2016

• We built an effective snow recognition model…

Learned Features

Copyright Cognomotiv 2016

Models: Simple Correlations

• Simple models are also (usually) wrong

Copyright Cognomotiv 2016

Network Anomalies

45

• Malicious data packets have a small variety(low type-count), but happen in high frequency

– Current models are not good at detecting this type of anomaly

• Anomaly/outlier varies among application domains

• Labeled anomalies are not available for training/validation

/ 50

Copyright Cognomotiv 2016

Baselining

• Using ML to detect anomaly is easy when baseline is well-defined and follows simple mathematical model (e.g. Normal Distribution)

• Most real-world systems don’t render a simple baseline (i.e. their behavior is very complex)

• [!]Sanctity of baseline: “nearly 100% of networks are compromised”

Copyright Cognomotiv 2016

Time Shifting

• “Window problem”: algos should be limited to ingest data in chunks that can be processed

– What if the anomaly is seeded outside that window?

• Network traffic diversity: usage varies in every session and with new applications

– window should also be shifted for recurring training

• Serious impact on performance, real-time, and security

Copyright Cognomotiv 2016

There’s More…

• How do you trust what the model predicts?– i.e. how do we know the model works correctly (husky)?

• Designing sound evaluation schemes can be more difficult than the detector itself

• We really don’t know how ML works

• … or how to reason about ML models

• … or how to debug them

• For now it’s just magic & voodoo

Copyright Cognomotiv 2016

CONCLUSIONAgenda

Copyright Cognomotiv 2016

Summary

50

• ML is a great and necessary technology

• ML really shines for some classes of problems

• ML is NOT the best solution for every problem (e.g. NetSec)

• Obtaining (and training with) useful data remains a challenge

• ML is just one initial building block of Machine Cognition and Artificial Understanding: there are many more

• Still a long way before machines can replicate humans!

/ 50

Copyright Cognomotiv 2016

THANK YOU!

Hadi Nahari

hadi@cognomotiv.com

hadinahari

Copyright Cognomotiv 2016

Backup

Copyright Cognomotiv 2016

References• Prof. Karl Friston seminal works

(http://www.fil.ion.ucl.ac.uk/~karl/#_Free-energy_principle)• “Why Should I Trust You?” Explaining the Predictions of Any Classifier, Carlos Guestrin, et al

(https://arxiv.org/pdf/1602.04938.pdf)• “Using Machine Learning in Network Intrusion Detection Systems”, Omar Shaya

(http://www.slideshare.net/OmarShaya/machine-learning-in-networks-intrusion-detection?next_slideshow=1)

• “Machine Learning Is Not The Answer To Better Network Security”, Matt Harrigan(https://techcrunch.com/2016/02/29/machine-learning-is-not-the-answer-to-better-network-security/)

• “Machine Learning Algorithm Cheat Sheet”, Laura Diane Hamilton, (http://www.lauradhamilton.com/machine-learning-algorithm-cheat-sheet)

• “Anomaly Detection Approaches for Communicating Networks”(http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf)

• “A Survey on Machine Learning Techniques for Intrusion Detection Systems”, J. Sing, N.J. Nene, (http://ijarcce.com/upload/2013/november/35-o-jayveer_singh-A_Survey_on_Machine.pdf)

• “Machine Learning Techniques for Anomaly Detection: An Overview”, S. Omar, et al,(http://research.ijcaonline.org/volume79/number2/pxc3891478.pdf)

• “Recent Advances in Predictive (Machine) Learning”, J.H. Friedman, et al, (http://statweb.stanford.edu/~jhf/ftp/machine)

• “Outside the Closed World: On Using Machine Learning For Network Intrusion Detection”, R. Sommer, V. Paxson, (http://www.utdallas.edu/~muratk/courses/dmsec_files/oakland10-ml.pdf)

• http://xkcd.com

Copyright Cognomotiv 2016

• IQ scores are rising

• Underlying biological “HW” declining

• “Intelligence” is in decline

Are Humans Getting Smarter?