Upload
yan
View
38
Download
0
Embed Size (px)
DESCRIPTION
Activity monitoring: Anomaly detection as on-line classification Tom Fawcett HP Laboratories 1501 Page Mill Rd. Palo Alto, CA Symposium on Machine Learning for Anomaly Detection May 22-23, 2004. User 1. User 2. User 3. User 4. ls. ls. gcc. netscape. from. cd. a.out. netscape. cd. ls. - PowerPoint PPT Presentation
Citation preview
1Symposium on Machine Learning for Anomaly Detection
Activity monitoring:Anomaly detection as on-line
classification
Tom FawcettHP Laboratories
1501 Page Mill Rd.Palo Alto, CA
Symposium on Machine Learning for Anomaly DetectionMay 22-23, 2004
2Symposium on Machine Learning for Anomaly Detection
Example: Intrusion detection
pwd.....
...
fromfingerpwdlsrloginfrompwdsudocalcsgccsudocalc
.cdypcatxfig
.gdbsulatex
.vilogoutemacs
netscapea.outemacsgv
acroreadgccfromlatex
netscapevilscd
netscapea.outcdfrom
netscapegcclslsUser 4User 3User 2User 1
intrusion
3Symposium on Machine Learning for Anomaly Detection
Example: Monitoring digital switch health
.
.
.Abnormal behavior culminating in hard failure
S1
S2
S3
Si
4Symposium on Machine Learning for Anomaly Detection
Example: Monitoring business news
1 May 20, 1999—VISX Inc. today announced that the Federal Trade Commission (FTC) has filed a notice of appeal of the decision issued earlier this month by the FTC administrative…2 May 29 (Reuters) – Federal advisers backed approval Thursday of a laser made by VISX Inc. used to correct nearsightedness with or without astigmatism. […]3 June 3 (PR Newswire) – VISX, Inc. of Santa Clara, California, will become a component of the Nasdaq-100 Index, effective at the beginning of trading Thursday, June 10, 1999. […]4 June 4 (PR Newswire) --- Amazon.com, facing threats of legal action from The New York Times, has asked the U.S. District Court in Seattle to allow Amazon.com to continue advertising.5 June 5, 1999 --- Motorola today announced that its MPC923, MPC950 and MPC960 PowerPC processors have been officially certified by Microsoft Corporation to support the…6 June 8, 1999 (PR Newswire) --- WebTV Networks, Inc. and EchoStar Communications Corp. at CES today announced the Microsoft WebTV Network Plus service for satellite and the EchoStar…
5Symposium on Machine Learning for Anomaly Detection
Monitoring business news — VISX1 May 20, 1999—VISX Inc. today announced that the Federal Trade Commission (FTC) has filed a notice of appeal of the decision issued earlier this month by the FTC administrative…
2 May 29 (Reuters) – Federal advisers backed approval Thursday of a laser made by VISX Inc. used to correct nearsightedness with or without astigmatism. […]
3 June 3 (PR Newswire) – VISX, Inc. of Santa Clara, California, will become a component of the Nasdaq-100 Index, effective at the beginning of trading Thursday, June 10, 1999. […]
VISX stock price
6Symposium on Machine Learning for Anomaly Detection
Commonalities of the domains
● Temporal: data comprise time series
● Large number of entities (users, companies, accounts, devices)
● Large volumes of data (commands, news stories, calls) on entity activity
● General goal is to alert on interesting, rare events (intrusions, fraud, unusual business activity)
Detection goals:● Identify as many interesting events as possible● Alert as soon as possible● Minimize false alarms
Onset of significant activity
7Symposium on Machine Learning for Anomaly Detection
Activity monitoring problems
Intrusion detectionLee, Stolfo, Mok (KDD-99)
Lane & Brodley (KDD-98)
Ryan et al. (FDRM-97)
DuMouchel & Schonlau (KDD-98)
Network alarm monitoringSasisekharan et al. (KDD-94)
Weiss & Hirsh (KDD-98)
Klemettinen 99
Hardware fault detectionDasgupta & Forrest 96
Smyth 92
Topic detection and trackingAllan, Papka & Lavrenko (SIGIR-98)Crabtree & Soltysiak (IJCAI-97)Allen (ed.), 2002
Fraud detectionChan & Stolfo (KDD-98)Cox et al. (DMKD-97)Fawcett & Provost (KDD-97)Burge & Shawe-Taylor (FDRM-97)
Ezawa & Norton (KDD-95)
News/event alertsYang, Pierce & Carbonell
Fawcett & Provost (KDD-99)
Epidemic/bio-terrorism detectionWong et al. 2002, 2003Shmueli 2004
8Symposium on Machine Learning for Anomaly Detection
Standard supervised learning approach
Onset of significant activity
Event stream
Instance vectors Class
----
++
.
.
...
.....................................w1w2w3
w4
w1
w2
w3
w2
wn 1
wn
wn 1
Many approaches use |w|=1
Classification problem
Window vector extraction
wn
9Symposium on Machine Learning for Anomaly Detection
Challenges for machine learning approaches
Login sessions: user intruder Intrusion
• Very skewed class distributions – inherent asymmetry• Differing error costs• Imprecision in class and cost distributions
• Temporal dependencies among alarmsEarlier is better than laterSeveral is (usually) no better than one
• Solutions may use different representationsDifferent timescales, different granularity
|w| = 1 command|w| = 1 login session|w| = 1 process life
10Symposium on Machine Learning for Anomaly Detection
Formalism
Di
dd d d dd d d ddd d d d dddd d dd d d
Normal activity
• D: set of data streams being monitored• D
i = < d
1, d
2, d
3, ..., d
n>: sequence of data items in stream D
i
• : alarm time• : onset of positive activity
Each episode has at most one Benefit/cost of alarms:
s(, a, H, Di): benefit of if true positive
f(, H, Di): cost of if false positive
Positive activity
(H is alarm history; see paper)
11Symposium on Machine Learning for Anomaly Detection
Formalism• D : set of data streams being monitored• D
i = < d
1, d
2, d
3, ..., d
n>: sequence of data items in stream i
• : alarm time• : onset of positive activity
Benefit/cost of alarms:s(, , H, D
i): benefit of if true positive
f(, H, Di): cost of if false positive
(H is alarm history; see paper)
Example: Plot of s(, O, Di) as a function of alarm time
0
smax
s
12Symposium on Machine Learning for Anomaly Detection
Detecting digital switch failures
0
smax
s
Hard failure point
Onset of observable
switch abnormalities
Minimum advance
notice
13Symposium on Machine Learning for Anomaly Detection
How is this framework better?
More realistic evaluation of solution methods
•Differing error costs•Skewed class distributions
AMOC analysis
•Temporal dependencies among alarms:
Earlier is better than later
Several is no better than one
•Solutions may use different representations
Different timescales, granularities
•Time and alarm history in s and f
•AMOC normalizes WRT time
(no definite notion of false positive max)
14Symposium on Machine Learning for Anomaly Detection
Random alarms with different frequencies (.1/hr, .2/hr, etc.
ROC curves vs AMOC curves
AMOC curves
s(τ,α) =1 if 0 α-τ 50 otherwise
f = 1
15Symposium on Machine Learning for Anomaly Detection
Activity monitoring: Solution approaches
Fundamental problem characteristics:
• Asymmetry of classes: Positive activity is inherently rare
Discriminating method: differentiates positive and normal activity
vs
Profiling method: models normal activity without reference to positive.(ie, learning from negative examples only)
• Multi-level representation of data
Uniform modeling: Models activity uniformly across all monitored entities
vs
Individual modeling: Models Di activity individually
16Symposium on Machine Learning for Anomaly Detection
• Goal: Scan news stories associated with businesses, alarm on stories that correlate with “interesting” behavior.
Interesting = 10% change in stock price (up or down) within 34.5 hours
• Data: Yahoo stories and stock prices from 6000 companies over 3 months
• DC-1 system
Developed for cell phone fraud detection
Performs discriminating, individual modeling
Example: Monitoring business news
DIntel
17Symposium on Machine Learning for Anomaly Detection
Example: Monitoring business news
said [it] expects same period revenues increase over per sharefourth compare[d] income quarter fiscalearnings per diluted fiscal quarter ended expenses months endedtoday reported consensus quarter earnings year ended repurchaselower than shortfall Q[1234] fourth-quarter first callbelow analyst for quarter research [and] development
Textual indicators for price spikes:
s(τ,α) =1 if 0 α-τ 34.5 hours0 otherwise
f = 1
AMOC curve
18Symposium on Machine Learning for Anomaly Detection
Pitfalls in evaluation
1. Evaluating too locally
• Windows shouldn’t overlap
• Behavior may be episodic or local (“bull market behavior”)
Need out-of-time sampling
Why performance may look better than it should
Di…
Train Test
19Symposium on Machine Learning for Anomaly Detection
Pitfalls in evaluation
2. Mixing events from a single account between train and test sets
• Goal of evaluation is to determine how well system will work on new, unseen accounts.
• Events within an account may be much more similar to each other than to events in other accounts
• Mixing one account’s examples between train and test sets may leak test info into training
• Need out-of-account sampling
…
…Train Test
Train
Test
20Symposium on Machine Learning for Anomaly Detection
Conclusions
1. This form of anomaly detection is inherently classification
• Alarms True positives, false positives, etc.
• Classification methods can be brought to bear
2. But temporal aspects make standard classification metrics inappropriate
3. Activity monitoring domains are common in machine learning. Solution methods & strategies can be shared and adapted.
21Symposium on Machine Learning for Anomaly Detection
[end]
22Symposium on Machine Learning for Anomaly Detection
Activity monitoring: Learning methods
D1…
D2…
D3…
D4…
D5…
.
.
. Problem characteristics
Class asymmetry
Discriminating methodvs
Profiling method
Multi-level representation
Uniform modelingvs
Individual modeling
23Symposium on Machine Learning for Anomaly Detection
Transforming tau — Circuit failure
dd d d dd d d ddd d d dddd d dd d d
dd d d dd d d ddd d d d dddd d dd d d
d
Hard failure(end of episode)
Implicit lookahead interval
Beginning of positive visible activity
Degradation