30
A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic A Dissertation by Matthew V. Mahoney Major Advisor: Philip K. Chan

A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Embed Size (px)

DESCRIPTION

A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic. A Dissertation by Matthew V. Mahoney Major Advisor: Philip K. Chan. Overview. Related work in intrusion detection Approach Experimental results Simulated network Real background traffic - PowerPoint PPT Presentation

Citation preview

Page 1: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

A Machine Learning Approachto Detecting Attacks

by Identifying Anomaliesin Network Traffic

A Dissertation

by Matthew V. Mahoney

Major Advisor: Philip K. Chan

Page 2: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Overview

• Related work in intrusion detection

• Approach

• Experimental results– Simulated network– Real background traffic

• Conclusions and future work

Page 3: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Limitations of Intrusion Detection

• Host based (audit logs, virus checkers, system calls (Forrest 1996)) – Cannot be trusted after a compromise

• Network signature detection (SNORT (Roesch 1999), Bro (Paxson 1998))– Cannot detect novel attacks– Alarms occur in bursts

• Address/port anomaly detection (ADAM (Barbara 2001), SPADE (Hoagland 2000), eBayes (Valdes & Skinner 2000))– Cannot detect attacks on public servers (web, mail)

Page 4: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Anomaly

SignatureNetwork

Host

User

System

BSM

VirusDetection

SNORT Bro

AuditLogs

Firewalls

SPADEADAMeBayes

Network ProtocolAnomaly Detection

Intrusion Detection Dimensions

Model

Data Method

Page 5: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Problem Statement

• Detect (not prevent) attacks in network traffic• No prior knowledge of attack characteristics

Model of normal traffic

IDS

Training – no known attacks

Test data with attacks Alarms

Page 6: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Approach

1. Model protocols (extend user model)

2. Time-based model of “bursty” traffic

3. Learn conditional rules

4. Batch and continuous modeling

5. Test with simulated attacks and real background traffic

Page 7: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Approach 1. Protocol Modeling

• User model (conventional)– Source address for authentication– Destination port to detect scans

• Protocol model (new)– Unusual features (more likely to be

vulnerable)– Client idiosyncrasies– IDS evasion– Victim’s symptoms after an attack

Page 8: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Example Protocol Anomalies

Attack How detected

Category

Teardrop – overlapping IP fragments crashes target

IP fragments

Unusual feature

Sendmail – buffer overflow gives remote root shell

Lower case mail

Idiosyn-crasy

FIN scan (portsweep) - FIN packets not logged

FIN with-out ACK

Evasion

ARPpoison – Forged replies to ARP-who-has

Interrupt-ed TCP

Victim symptoms

Page 9: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Approach 2 -Non-Poisson Traffic Model (Paxson & Floyd, 1995)

• Events occur in bursts on all time scales

• Long range dependency

• No average rate of events

• Event probability depends on– The average rate in the past– And the time since it last occurred

Page 10: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Time-Based Model

If port = 25 then word1 = HELO or EHLO

• Anomaly: any value never seen in training• Score = tn/r

– t = time since last anomaly for this rule– n = number of training instances (port = 25)– r = number of allowed values (2)

• Only the first anomaly in a burst receives a high score

Page 11: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Example

Training = AAAABBBBAA Test = AACCC

• C is an anomaly• r/n = average rate of training anomalies =

2/10 (first A and first B)• t = time since last anomaly = 9, 1, 1• Score (C) = tn/r = 45, 5, 5

Page 12: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Approach 3. Rule Learning

1. Sample training pairs to suggest rules with n/r = 2/1

2. Remove redundant rules, favoring high n/r

3. Validation: remove rules that generate alarms on attack-free traffic

Page 13: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Learning Step 1 - Sampling

Port Word1 Word2 Word3

80 GET / HTTP/1.0

80 GET /index.html HTTP/1.0

• If port = 80 then word1 = GET

• word3 = HTTP/1.0

• If word3 = HTTP/1.0 and word1 = GET then port = 80

Page 14: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Learning Step 2 – Remove Redundant Rules (Sorted by n/r)

• R1: if port = 80 then word1 = GET (n/r = 2/1, OK)• R2: word1 = HELO or GET (n/r = 3/2, OK)• R3: if port = 25 then word1 = HELO (n/r = 1/1, remove)• R4: word2 = pascal, /, or /index.html (n/r = 3/3, OK)

Port Word1 Word2 Word3

25 HELO pascal MAIL

80 GET / HTTP/1.0

80 GET /index.html HTTP/1.0

Page 15: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Learning Step 3 – Rule Validation

• Training (no attacks) – Learn rules, n/r• Validation (no attacks) – Discard rules that

generate alarms• Testing (with attacks)

Train Validate Test

Page 16: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Approach 4. Continuous Modeling

• No separate training and test phases

• Training data may contain attacks

• Model allows for previously seen values

• Score = tn/r + ti/fi

– ti = tine since value i last seen

– fi = frequency of i in training, fi > 0

• No validation step

Page 17: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Implementation

Model Data Con-ditions

Valid-ation

Score

PHAD Packet headers

None No tn/r

ALAD TCP streams

Server, port

No tn/r

LERAD TCP streams

Learned Yes tn/r

NETAD Packet bytes

Protocol Yes tn/r + ti/fi

Page 18: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Example Rules (LERAD)1 39406/1 if SA3=172 then SA2 = 0162 39406/1 if SA2=016 then SA3 = 1723 28055/1 if F1=.UDP then F3 = .4 28055/1 if F1=.UDP then F2 = .5 28055/1 if F3=. then F1 = .UDP6 28055/1 if F3=. then DUR = 07 27757/1 if DA0=100 then DA1 = 1128 25229/1 if W6=. then W7 = .9 25221/1 if W5=. then W6 = .10 25220/1 if W4=. then W8 = .11 25220/1 if W4=. then W5 = .12 17573/1 if DA1=118 then W1 = .^B^A^@^@13 17573/1 if DA1=118 then SA1 = 11214 17573/1 if SP=520 then DP = 52015 17573/1 if SP=520 then W2 = .^P^@^@^@16 17573/1 if DP=520 then DA1 = 11817 17573/1 if DA1=118 SA1=112 then LEN = 518 28882/2 if F2=.AP then F1 = .S .AS19 12867/1 if W1=.^@GET then DP = 8020 68939/6 if then DA1 = 118 112 113 115 114 11621 68939/6 if then F1 = .UDP .S .AF .ICMP .AS .R22 9914/1 if W3=.HELO then W1 = .^@EHLO23 9914/1 if F1=.S W3=.HELO then DP = 2524 9914/1 if DP=25 W5=.MAIL then W3 = .HELO

Page 19: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

1999 DARPA IDS Evaluation(Lippmann et al. 2000)

• 7 days training data with no attacks• 2 weeks test data with 177 visible attacks• Must identify victim and time of attack

SunOS Solaris Linux WinNT

IDSVictims

Internet(simulated)

Attacks

Page 20: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Attacks Detected at 10 FA/Day

0

20

40

60

80

100

120

140

160

PHAD ALAD LERAD NETAD Continuous

Page 21: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Unlikely Detections

• Attacks on public servers (web, mail, DNS) detected by source address

• Application server attacks detected by packet header fields

• U2R (user to root) detected by FTP upload

Page 22: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Unrealistic Background Traffic

• Source Address, client versions (too few clients)• TTL, TCP options, TCP window size (artifacts)• Checksum errors, “crud”, invalid keywords and

values (too clean)

r

Time

Simulated

Real

Page 23: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

5. Injecting Real Background Traffic

• Collected on a university departmental web server• Filtered: truncated inbound client traffic only• IDS modified to avoid conditioning on traffic source

SunOS Solaris Linux WinNT

IDS

Internet(simulatedand real)

AttacksReal web server

Page 24: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Mixed Traffic: Fewer Detections, but More are Legitimate

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD

Total

Legitimate

Page 25: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

0

25

50

75

100

125

0 100 200 300 400 500

NETAD-S

LERAD-S

NETAD-C

LERAD-C

Detections out of 148

False Alarms

Detections vs. False Alarms(Simulated and Combined Traffic)

Page 26: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Results Summary

• Original 1999 evaluation: 40-55% detected at 10 false alarms per day

• NETAD (excluding U2R): 75%

• Mixed traffic: LERAD + NETAD: 30%

• At 50 FA/day: NETAD: 47%

Page 27: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Contributions

1. Protocol modeling

2. Time based modeling for bursty traffic

3. Rule learning

4. Continuous modeling

5. Removing simulation artifacts

Page 28: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Limitations

• False alarms – Unusual data is not always hostile

• Rule learning requires 2 passes (not continuous)• Tests with real traffic are not reproducible

(privacy concerns)• Unlabeled attacks in real traffic

– GET /MSADC/root.exe?/c+dir HTTP/1.0– GET /scripts/..%255c%255c../winnt/system32/cmd.exe?/c+dir

Page 29: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Future Work

• Modify rule learning for continuous traffic

• Add other attributes

• User feedback (should this anomaly be added to the model?)

• Test with real attacks

Page 30: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic

Acknowledgments

• Philip K. Chan – Directing research• Advisors – Ryan Stansifer, Kamel Rekab, James

Whittaker• Ongoing work

– Gaurav Tandon – Host based detection using LERAD (system call arguments)

– Rachna Vargiya – Parsing application payload– Hyoung Rae Kim – Payload lexical/semantic analysis– Muhammad Arshad – Outlier detection in network

traffic• DARPA – Providing funding and test data