A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic A Dissertation by Matthew V. Mahoney Major Advisor: Philip

A Machine Learning Approachto Detecting Attacks

by Identifying Anomaliesin Network Traffic

A Dissertation

by Matthew V. Mahoney

Major Advisor: Philip K. Chan

Overview

• Related work in intrusion detection

• Approach

• Experimental results– Simulated network– Real background traffic

• Conclusions and future work

Limitations of Intrusion Detection

• Host based (audit logs, virus checkers, system calls (Forrest 1996)) – Cannot be trusted after a compromise

• Network signature detection (SNORT (Roesch 1999), Bro (Paxson 1998))– Cannot detect novel attacks– Alarms occur in bursts

• Address/port anomaly detection (ADAM (Barbara 2001), SPADE (Hoagland 2000), eBayes (Valdes & Skinner 2000))– Cannot detect attacks on public servers (web, mail)

Anomaly

SignatureNetwork

Host

User

System

BSM

VirusDetection

SNORT Bro

AuditLogs

Firewalls

SPADEADAMeBayes

Network ProtocolAnomaly Detection

Intrusion Detection Dimensions

Model

Data Method

Problem Statement

• Detect (not prevent) attacks in network traffic• No prior knowledge of attack characteristics

Model of normal traffic

IDS

Training – no known attacks

Test data with attacks Alarms

Approach

1. Model protocols (extend user model)

2. Time-based model of “bursty” traffic

3. Learn conditional rules

4. Batch and continuous modeling

5. Test with simulated attacks and real background traffic

Approach 1. Protocol Modeling

• User model (conventional)– Source address for authentication– Destination port to detect scans

• Protocol model (new)– Unusual features (more likely to be

vulnerable)– Client idiosyncrasies– IDS evasion– Victim’s symptoms after an attack

Example Protocol Anomalies

Attack How detected

Category

Teardrop – overlapping IP fragments crashes target

IP fragments

Unusual feature

Sendmail – buffer overflow gives remote root shell

Lower case mail

Idiosyn-crasy

FIN scan (portsweep) - FIN packets not logged

FIN with-out ACK

Evasion

ARPpoison – Forged replies to ARP-who-has

Interrupt-ed TCP

Victim symptoms

Approach 2 -Non-Poisson Traffic Model (Paxson & Floyd, 1995)

• Events occur in bursts on all time scales

• Long range dependency

• No average rate of events

• Event probability depends on– The average rate in the past– And the time since it last occurred

Time-Based Model

If port = 25 then word1 = HELO or EHLO

• Anomaly: any value never seen in training• Score = tn/r

– t = time since last anomaly for this rule– n = number of training instances (port = 25)– r = number of allowed values (2)

• Only the first anomaly in a burst receives a high score

Example

Training = AAAABBBBAA Test = AACCC

• C is an anomaly• r/n = average rate of training anomalies =

2/10 (first A and first B)• t = time since last anomaly = 9, 1, 1• Score (C) = tn/r = 45, 5, 5

Approach 3. Rule Learning

1. Sample training pairs to suggest rules with n/r = 2/1

2. Remove redundant rules, favoring high n/r

3. Validation: remove rules that generate alarms on attack-free traffic

Learning Step 1 - Sampling

Port Word1 Word2 Word3

80 GET / HTTP/1.0

80 GET /index.html HTTP/1.0

• If port = 80 then word1 = GET

• word3 = HTTP/1.0

• If word3 = HTTP/1.0 and word1 = GET then port = 80

Learning Step 2 – Remove Redundant Rules (Sorted by n/r)

• R1: if port = 80 then word1 = GET (n/r = 2/1, OK)• R2: word1 = HELO or GET (n/r = 3/2, OK)• R3: if port = 25 then word1 = HELO (n/r = 1/1, remove)• R4: word2 = pascal, /, or /index.html (n/r = 3/3, OK)

Port Word1 Word2 Word3

25 HELO pascal MAIL

80 GET / HTTP/1.0

80 GET /index.html HTTP/1.0

Learning Step 3 – Rule Validation

• Training (no attacks) – Learn rules, n/r• Validation (no attacks) – Discard rules that

generate alarms• Testing (with attacks)

Train Validate Test

Approach 4. Continuous Modeling

• No separate training and test phases

• Training data may contain attacks

• Model allows for previously seen values

• Score = tn/r + ti/fi

– ti = tine since value i last seen

– fi = frequency of i in training, fi > 0

• No validation step

Implementation

Model Data Con-ditions

Valid-ation

Score

PHAD Packet headers

None No tn/r

ALAD TCP streams

Server, port

No tn/r

LERAD TCP streams

Learned Yes tn/r

NETAD Packet bytes

Protocol Yes tn/r + ti/fi

Example Rules (LERAD)1 39406/1 if SA3=172 then SA2 = 0162 39406/1 if SA2=016 then SA3 = 1723 28055/1 if F1=.UDP then F3 = .4 28055/1 if F1=.UDP then F2 = .5 28055/1 if F3=. then F1 = .UDP6 28055/1 if F3=. then DUR = 07 27757/1 if DA0=100 then DA1 = 1128 25229/1 if W6=. then W7 = .9 25221/1 if W5=. then W6 = .10 25220/1 if W4=. then W8 = .11 25220/1 if W4=. then W5 = .12 17573/1 if DA1=118 then W1 = .^B^A^@^@13 17573/1 if DA1=118 then SA1 = 11214 17573/1 if SP=520 then DP = 52015 17573/1 if SP=520 then W2 = .^P^@^@^@16 17573/1 if DP=520 then DA1 = 11817 17573/1 if DA1=118 SA1=112 then LEN = 518 28882/2 if F2=.AP then F1 = .S .AS19 12867/1 if W1=.^@GET then DP = 8020 68939/6 if then DA1 = 118 112 113 115 114 11621 68939/6 if then F1 = .UDP .S .AF .ICMP .AS .R22 9914/1 if W3=.HELO then W1 = .^@EHLO23 9914/1 if F1=.S W3=.HELO then DP = 2524 9914/1 if DP=25 W5=.MAIL then W3 = .HELO

1999 DARPA IDS Evaluation(Lippmann et al. 2000)

• 7 days training data with no attacks• 2 weeks test data with 177 visible attacks• Must identify victim and time of attack

SunOS Solaris Linux WinNT

IDSVictims

Internet(simulated)

Attacks

Attacks Detected at 10 FA/Day

0

20

40

60

80

100

120

140

160

PHAD ALAD LERAD NETAD Continuous

Unlikely Detections

• Attacks on public servers (web, mail, DNS) detected by source address

• Application server attacks detected by packet header fields

• U2R (user to root) detected by FTP upload

Unrealistic Background Traffic

• Source Address, client versions (too few clients)• TTL, TCP options, TCP window size (artifacts)• Checksum errors, “crud”, invalid keywords and

values (too clean)

r

Time

Simulated

Real

5. Injecting Real Background Traffic

• Collected on a university departmental web server• Filtered: truncated inbound client traffic only• IDS modified to avoid conditioning on traffic source

SunOS Solaris Linux WinNT

IDS

Internet(simulatedand real)

AttacksReal web server

Mixed Traffic: Fewer Detections, but More are Legitimate

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD

Total

Legitimate

0

25

50

75

100

125

0 100 200 300 400 500

NETAD-S

LERAD-S

NETAD-C

LERAD-C

Detections out of 148

False Alarms

Detections vs. False Alarms(Simulated and Combined Traffic)

Results Summary

• Original 1999 evaluation: 40-55% detected at 10 false alarms per day

• NETAD (excluding U2R): 75%

• Mixed traffic: LERAD + NETAD: 30%

• At 50 FA/day: NETAD: 47%

Contributions

1. Protocol modeling

2. Time based modeling for bursty traffic

3. Rule learning

4. Continuous modeling

5. Removing simulation artifacts

Limitations

• False alarms – Unusual data is not always hostile

• Rule learning requires 2 passes (not continuous)• Tests with real traffic are not reproducible

(privacy concerns)• Unlabeled attacks in real traffic

– GET /MSADC/root.exe?/c+dir HTTP/1.0– GET /scripts/..%255c%255c../winnt/system32/cmd.exe?/c+dir

Future Work

• Modify rule learning for continuous traffic

• Add other attributes

• User feedback (should this anomaly be added to the model?)

• Test with real attacks

Acknowledgments

• Philip K. Chan – Directing research• Advisors – Ryan Stansifer, Kamel Rekab, James

Whittaker• Ongoing work

– Gaurav Tandon – Host based detection using LERAD (system call arguments)

– Rachna Vargiya – Parsing application payload– Hyoung Rae Kim – Payload lexical/semantic analysis– Muhammad Arshad – Outlier detection in network

traffic• DARPA – Providing funding and test data

Documents

A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic A Dissertation by Matthew V. Mahoney Major Advisor: Philip