23
1 Symposium on Machine Learning for Anomaly Detection Activity monitoring: Anomaly detection as on-line classification Tom Fawcett HP Laboratories 1501 Page Mill Rd. Palo Alto, CA Symposium on Machine Learning for Anomaly Detection May 22-23, 2004

Example: Intrusion detection

  • Upload
    yan

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Activity monitoring: Anomaly detection as on-line classification Tom Fawcett HP Laboratories 1501 Page Mill Rd. Palo Alto, CA Symposium on Machine Learning for Anomaly Detection May 22-23, 2004. User 1. User 2. User 3. User 4. ls. ls. gcc. netscape. from. cd. a.out. netscape. cd. ls. - PowerPoint PPT Presentation

Citation preview

Page 1: Example: Intrusion detection

1Symposium on Machine Learning for Anomaly Detection

Activity monitoring:Anomaly detection as on-line

classification

Tom FawcettHP Laboratories

1501 Page Mill Rd.Palo Alto, CA

Symposium on Machine Learning for Anomaly DetectionMay 22-23, 2004

Page 2: Example: Intrusion detection

2Symposium on Machine Learning for Anomaly Detection

Example: Intrusion detection

pwd.....

...

fromfingerpwdlsrloginfrompwdsudocalcsgccsudocalc

.cdypcatxfig

.gdbsulatex

.vilogoutemacs

netscapea.outemacsgv

acroreadgccfromlatex

netscapevilscd

netscapea.outcdfrom

netscapegcclslsUser 4User 3User 2User 1

intrusion

Page 3: Example: Intrusion detection

3Symposium on Machine Learning for Anomaly Detection

Example: Monitoring digital switch health

.

.

.Abnormal behavior culminating in hard failure

S1

S2

S3

Si

Page 4: Example: Intrusion detection

4Symposium on Machine Learning for Anomaly Detection

Example: Monitoring business news

1 May 20, 1999—VISX Inc. today announced that the Federal Trade Commission (FTC) has filed a notice of appeal of the decision issued earlier this month by the FTC administrative…2 May 29 (Reuters) – Federal advisers backed approval Thursday of a laser made by VISX Inc. used to correct nearsightedness with or without astigmatism. […]3 June 3 (PR Newswire) – VISX, Inc. of Santa Clara, California, will become a component of the Nasdaq-100 Index, effective at the beginning of trading Thursday, June 10, 1999. […]4 June 4 (PR Newswire) --- Amazon.com, facing threats of legal action from The New York Times, has asked the U.S. District Court in Seattle to allow Amazon.com to continue advertising.5 June 5, 1999 --- Motorola today announced that its MPC923, MPC950 and MPC960 PowerPC processors have been officially certified by Microsoft Corporation to support the…6 June 8, 1999 (PR Newswire) --- WebTV Networks, Inc. and EchoStar Communications Corp. at CES today announced the Microsoft WebTV Network Plus service for satellite and the EchoStar…

Page 5: Example: Intrusion detection

5Symposium on Machine Learning for Anomaly Detection

Monitoring business news — VISX1 May 20, 1999—VISX Inc. today announced that the Federal Trade Commission (FTC) has filed a notice of appeal of the decision issued earlier this month by the FTC administrative…

2 May 29 (Reuters) – Federal advisers backed approval Thursday of a laser made by VISX Inc. used to correct nearsightedness with or without astigmatism. […]

3 June 3 (PR Newswire) – VISX, Inc. of Santa Clara, California, will become a component of the Nasdaq-100 Index, effective at the beginning of trading Thursday, June 10, 1999. […]

VISX stock price

Page 6: Example: Intrusion detection

6Symposium on Machine Learning for Anomaly Detection

Commonalities of the domains

● Temporal: data comprise time series

● Large number of entities (users, companies, accounts, devices)

● Large volumes of data (commands, news stories, calls) on entity activity

● General goal is to alert on interesting, rare events (intrusions, fraud, unusual business activity)

Detection goals:● Identify as many interesting events as possible● Alert as soon as possible● Minimize false alarms

Onset of significant activity

Page 7: Example: Intrusion detection

7Symposium on Machine Learning for Anomaly Detection

Activity monitoring problems

Intrusion detectionLee, Stolfo, Mok (KDD-99)

Lane & Brodley (KDD-98)

Ryan et al. (FDRM-97)

DuMouchel & Schonlau (KDD-98)

Network alarm monitoringSasisekharan et al. (KDD-94)

Weiss & Hirsh (KDD-98)

Klemettinen 99

Hardware fault detectionDasgupta & Forrest 96

Smyth 92

Topic detection and trackingAllan, Papka & Lavrenko (SIGIR-98)Crabtree & Soltysiak (IJCAI-97)Allen (ed.), 2002

Fraud detectionChan & Stolfo (KDD-98)Cox et al. (DMKD-97)Fawcett & Provost (KDD-97)Burge & Shawe-Taylor (FDRM-97)

Ezawa & Norton (KDD-95)

News/event alertsYang, Pierce & Carbonell

Fawcett & Provost (KDD-99)

Epidemic/bio-terrorism detectionWong et al. 2002, 2003Shmueli 2004

Page 8: Example: Intrusion detection

8Symposium on Machine Learning for Anomaly Detection

Standard supervised learning approach

Onset of significant activity

Event stream

Instance vectors Class

----

++

.

.

...

.....................................w1w2w3

w4

w1

w2

w3

w2

wn 1

wn

wn 1

Many approaches use |w|=1

Classification problem

Window vector extraction

wn

Page 9: Example: Intrusion detection

9Symposium on Machine Learning for Anomaly Detection

Challenges for machine learning approaches

Login sessions: user intruder Intrusion

• Very skewed class distributions – inherent asymmetry• Differing error costs• Imprecision in class and cost distributions

• Temporal dependencies among alarmsEarlier is better than laterSeveral is (usually) no better than one

• Solutions may use different representationsDifferent timescales, different granularity

|w| = 1 command|w| = 1 login session|w| = 1 process life

Page 10: Example: Intrusion detection

10Symposium on Machine Learning for Anomaly Detection

Formalism

Di

dd d d dd d d ddd d d d dddd d dd d d

Normal activity

• D: set of data streams being monitored• D

i = < d

1, d

2, d

3, ..., d

n>: sequence of data items in stream D

i

• : alarm time• : onset of positive activity

Each episode has at most one Benefit/cost of alarms:

s(, a, H, Di): benefit of if true positive

f(, H, Di): cost of if false positive

Positive activity

(H is alarm history; see paper)

Page 11: Example: Intrusion detection

11Symposium on Machine Learning for Anomaly Detection

Formalism• D : set of data streams being monitored• D

i = < d

1, d

2, d

3, ..., d

n>: sequence of data items in stream i

• : alarm time• : onset of positive activity

Benefit/cost of alarms:s(, , H, D

i): benefit of if true positive

f(, H, Di): cost of if false positive

(H is alarm history; see paper)

Example: Plot of s(, O, Di) as a function of alarm time

0

smax

s

Page 12: Example: Intrusion detection

12Symposium on Machine Learning for Anomaly Detection

Detecting digital switch failures

0

smax

s

Hard failure point

Onset of observable

switch abnormalities

Minimum advance

notice

Page 13: Example: Intrusion detection

13Symposium on Machine Learning for Anomaly Detection

How is this framework better?

More realistic evaluation of solution methods

•Differing error costs•Skewed class distributions

AMOC analysis

•Temporal dependencies among alarms:

Earlier is better than later

Several is no better than one

•Solutions may use different representations

Different timescales, granularities

•Time and alarm history in s and f

•AMOC normalizes WRT time

(no definite notion of false positive max)

Page 14: Example: Intrusion detection

14Symposium on Machine Learning for Anomaly Detection

Random alarms with different frequencies (.1/hr, .2/hr, etc.

ROC curves vs AMOC curves

AMOC curves

s(τ,α) =1 if 0 α-τ 50 otherwise

f = 1

Page 15: Example: Intrusion detection

15Symposium on Machine Learning for Anomaly Detection

Activity monitoring: Solution approaches

Fundamental problem characteristics:

• Asymmetry of classes: Positive activity is inherently rare

Discriminating method: differentiates positive and normal activity

vs

Profiling method: models normal activity without reference to positive.(ie, learning from negative examples only)

• Multi-level representation of data

Uniform modeling: Models activity uniformly across all monitored entities

vs

Individual modeling: Models Di activity individually

Page 16: Example: Intrusion detection

16Symposium on Machine Learning for Anomaly Detection

• Goal: Scan news stories associated with businesses, alarm on stories that correlate with “interesting” behavior.

Interesting = 10% change in stock price (up or down) within 34.5 hours

• Data: Yahoo stories and stock prices from 6000 companies over 3 months

• DC-1 system

Developed for cell phone fraud detection

Performs discriminating, individual modeling

Example: Monitoring business news

DIntel

Page 17: Example: Intrusion detection

17Symposium on Machine Learning for Anomaly Detection

Example: Monitoring business news

said [it] expects same period revenues increase over per sharefourth compare[d] income quarter fiscalearnings per diluted fiscal quarter ended expenses months endedtoday reported consensus quarter earnings year ended repurchaselower than shortfall Q[1234] fourth-quarter first callbelow analyst for quarter research [and] development

Textual indicators for price spikes:

s(τ,α) =1 if 0 α-τ 34.5 hours0 otherwise

f = 1

AMOC curve

Page 18: Example: Intrusion detection

18Symposium on Machine Learning for Anomaly Detection

Pitfalls in evaluation

1. Evaluating too locally

• Windows shouldn’t overlap

• Behavior may be episodic or local (“bull market behavior”)

Need out-of-time sampling

Why performance may look better than it should

Di…

Train Test

Page 19: Example: Intrusion detection

19Symposium on Machine Learning for Anomaly Detection

Pitfalls in evaluation

2. Mixing events from a single account between train and test sets

• Goal of evaluation is to determine how well system will work on new, unseen accounts.

• Events within an account may be much more similar to each other than to events in other accounts

• Mixing one account’s examples between train and test sets may leak test info into training

• Need out-of-account sampling

…Train Test

Train

Test

Page 20: Example: Intrusion detection

20Symposium on Machine Learning for Anomaly Detection

Conclusions

1. This form of anomaly detection is inherently classification

• Alarms True positives, false positives, etc.

• Classification methods can be brought to bear

2. But temporal aspects make standard classification metrics inappropriate

3. Activity monitoring domains are common in machine learning. Solution methods & strategies can be shared and adapted.

Page 21: Example: Intrusion detection

21Symposium on Machine Learning for Anomaly Detection

[end]

Page 22: Example: Intrusion detection

22Symposium on Machine Learning for Anomaly Detection

Activity monitoring: Learning methods

D1…

D2…

D3…

D4…

D5…

.

.

. Problem characteristics

Class asymmetry

Discriminating methodvs

Profiling method

Multi-level representation

Uniform modelingvs

Individual modeling

Page 23: Example: Intrusion detection

23Symposium on Machine Learning for Anomaly Detection

Transforming tau — Circuit failure

dd d d dd d d ddd d d dddd d dd d d

dd d d dd d d ddd d d d dddd d dd d d

d

Hard failure(end of episode)

Implicit lookahead interval

Beginning of positive visible activity

Degradation