Benchmarking Anomaly-based Detection Systems

Preview:

DESCRIPTION

Benchmarking Anomaly-based Detection Systems. Ashish Gupta Network Security May 2004. Overview. The Motivation for this paper Waldo example The approach Structure in data Generating the data and anomalies Injecting anomalies Results Training and Testing: the method Scoring - PowerPoint PPT Presentation

Citation preview

Benchmarking Anomaly-based Detection Systems

Ashish GuptaNetwork Security

May 2004

Overview

• The Motivation for this paper– Waldo example

• The approach• Structure in data• Generating the data and anomalies• Injecting anomalies• Results

– Training and Testing: the method– Scoring– Presentation– The ROC curves: somewhat obvious

MotivationDoes anomaly detection depend on

regularity/randomness of data ?

Where’s Waldo !

Where’s Waldo !

Where’s Waldo !

The aim

• Hypothesis:– Differences in data regularity affect anomaly

detection– Different environments different regularity

• Regularity– Highly redundant or random ?– Example of environment’s affect

010101010101010101010101Or

0100011000101000100100101

Consequences

One IDS : Different False Alarm Rates

Need custom system/training for each environment ?

Temporal affects: Regularity may vary over time ?

Structure in dataMeasuring randomness

010101010101010101010101Or

0100011000101000100100101

Measuring Randomness

Relative Entropy Sequential Dependence+

Conditional Relative Entropy

The benchmark datasets

• Three types:– Training data ( the background data)– Anomalies– Testing data ( background + anomalies )

• Generating the sequences– 5 sets, each set 11 files ( for increasing

regularity)– Each set different alphabet size– Alphabet size decides complexity

Anomaly Generation

• What’s a surprise ? – Different from the expected probability

• Types:– Juxta-positional : different arrangements of data

• 001001001001001001111– Temporal

• Unexpected periodicities– Other types ?

Types in this paper

• Foreign symbol– AAABABBBABABCBBABABBA

• Foreign n-gram

– AAABABAABAABAAABBBBA• Rare n-gram

– AABBBABBBABBBABBBABBBABBAA

• Injecting anomalies– Make sure not more than 0.24 %

The experiments

The Hypothesis is true

• The hypothesis:– Nature of “normal” background noise affects

signal detection• The anomaly detector

– To detect anomalous subsequences– Learning phase n-gram probability table– Unexpected event anomaly !– Anomaly threshold decides level of surprise

• Example of anomaly detectionAAA 0.12

AAB 0.13

ABA 0.20

BAA 0.17

BBB 0.15

BBA 0.12

AAC ANOMALY !

Scoring

• Event outcomes– Hits– Misses– False alarms

• Threshold– Decides level of surprise– 0 completely unsurprising, 1 astonishing– Need to calibrate

Presentation of results

• Presents two aspects:– % correct detections– % false detections

• Detector operates through a range of sensitivities– Higher sensitivity ? – Need the right sensitivity

Interpretation

• Nothing overlaps regularity affects detection !

• What does this mean ?• Detection metrics are data dependent• Cannot say:

– My XYZ product will flag down 75% percent anomalies with 10% false hit rate !

– Sir, are you sure ?

Real world data

• Regularity index for system calls for different users

• Is this surprising ?• What about network traffic ?

Conclusions

Data Structure Anomaly Detection Effectiveness

Evaluation is data dependent

Conclusions

Change in regularityDifferent system

Or

Change the parameters

Quirks ?

• Assumes rather naïve detection systems– “Simple retraining will not suffice”

• An intelligent detection can take this into account.

• What is really an anomaly ? – If data is highly irregular, won’t randomness

produce some anomalies by itself• Anomaly is a relative term

– Here anomalies are generated independently

Recommended