24
Machine Learning for Network Anomaly Detection Matt Mahoney

Machine Learning for Network Anomaly Detection Matt Mahoney

Embed Size (px)

DESCRIPTION

Host Based Methods Virus Scanners File System Integrity Checkers (Tripwire, DERBI) Audit Logs System Call Monitoring – Self/Nonself (Forrest)

Citation preview

Page 1: Machine Learning for Network Anomaly Detection Matt Mahoney

Machine Learning for Network Anomaly Detection

Matt Mahoney

Page 2: Machine Learning for Network Anomaly Detection Matt Mahoney

Network Anomaly Detection

• Network – Monitors traffic to protect connected hosts

• Anomaly – Models normal behavior to detect novel attacks (some false alarms)

• Detection – Was there an attack?

Page 3: Machine Learning for Network Anomaly Detection Matt Mahoney

Host Based Methods

• Virus Scanners• File System Integrity Checkers (Tripwire,

DERBI)• Audit Logs• System Call Monitoring – Self/Nonself

(Forrest)

Page 4: Machine Learning for Network Anomaly Detection Matt Mahoney

Network Based Methods

• Firewalls• Signature Detection (SNORT, Bro)• Anomaly Detection (eBayes, NIDES,

ADAM, SPADE)

Page 5: Machine Learning for Network Anomaly Detection Matt Mahoney

User Modeling

• Source address – unauthorized users of authenticated services (telnet, ssh, pop3, imap)

• Destination address – IP scans• Destination port – port scans

Page 6: Machine Learning for Network Anomaly Detection Matt Mahoney

Frequency Based Models

• Used by SPADE, ADAM, NIDES, eBayes, etc.

• Anomaly score = 1/P(event)• Event probabilities estimated by counting

Page 7: Machine Learning for Network Anomaly Detection Matt Mahoney

Attacks on Public Services

PHF – exploits a CGI script bug on older Apache web servers

GET /cgi-bin/phf?Qalias=x%0a/usr/bin/ypcat%20passwd

Page 8: Machine Learning for Network Anomaly Detection Matt Mahoney

Buffer Overflows

• 1988 Morris Worm – fingerd• 2003 SQL Sapphire Wormchar buf[100];gets(buf);

buf stackExploit code

Return Address0 100

Page 9: Machine Learning for Network Anomaly Detection Matt Mahoney

TCP/IP Denial of Service Attacks

• Teardrop – overlapping IP fragments• Ping of Death – IP fragments reassemble

to > 64K• Dosnuke – urgent data in NetBIOS packet• Land – identical source and destination

addresses

Page 10: Machine Learning for Network Anomaly Detection Matt Mahoney

Protocol Modeling

• Attacks exploit bugs• Bugs are most common in the least tested

code• Most testing occurs after delivery• Therefore unusual data is more likely to be

hostile

Page 11: Machine Learning for Network Anomaly Detection Matt Mahoney

Protocol Models

• PHAD, NETAD – Packet Headers (Ethernet, IP, TCP, UDP, ICMP)

• ALAD, LERAD – Client TCP application payloads (HTTP, SMTP, FTP, …)

Page 12: Machine Learning for Network Anomaly Detection Matt Mahoney

Time Based Models

• Training and test phases• Values never seen in training are

suspicious• Score = t/p = tn/r where

– t = time since last anomaly– n = number of training examples– r = number of allowed values– p = r/n = fraction of values that are novel

Page 13: Machine Learning for Network Anomaly Detection Matt Mahoney

Example tn/r

• Training: 0000111000 n/r = 10/2• Testing: 01223

– 0: no score– 1: no score– 2: tn/r = 6 x 10/2 = 30– 2: tn/r = 1 x 10/2 = 5– 3: tn/r = 1 x 10/2 = 5

Page 14: Machine Learning for Network Anomaly Detection Matt Mahoney

PHAD – Fixed Rules

• 34 packet header fields– Ethernet (address, protocol)– IP (TOS, TTL, fragmentation, addresses)– TCP (options, flags, port numbers)– UDP (port numbers, checksum)– ICMP (type, code, checksum)

• Global model

Page 15: Machine Learning for Network Anomaly Detection Matt Mahoney

LERAD – Learns conditional Rules

• Models inbound client TCP (addresses, ports, flags, 8 words in payload)

• Learns conditional rules

If port = 80 then word1 = GET, POST (n/r = 10000/2)

Page 16: Machine Learning for Network Anomaly Detection Matt Mahoney

LERAD Rule Learning

• If word1 = GET then port = 80 (n/r = 2/1)• word1 = GET, HELO (n/r = 3/2)• If address = Marx then port = 80, 25 (n/r =

2/2)

Address Port Word1 Word2Hume 80 GET /Marx 80 GET /index.htmlMarx 25 HELO Pascal

Page 17: Machine Learning for Network Anomaly Detection Matt Mahoney

LERAD Rule Learning

• Randomly pick rules based on matching attributes

• Select nonoverlapping rules with high n/r on a sample

• Train on full training set (new n/r)• Discard rules that discover novel values in

last 10% of training (known false alarms)

Page 18: Machine Learning for Network Anomaly Detection Matt Mahoney

DARPA/Lincoln Labs Evaluation

• 1 week of attack-free training data• 2 weeks with 201 attacks

SunOS Solaris Linux NT

RouterInternet

SnifferAttacks

Page 19: Machine Learning for Network Anomaly Detection Matt Mahoney

Attacks out of 201 Detected at 10 False Alarms per Day

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD

Page 20: Machine Learning for Network Anomaly Detection Matt Mahoney

Problems with Synthetic Traffic

• Attributes are too predictable: TTL, TOS, TCP options, TCP window size, HTTP, SMTP command formatting

• Too few sources: Client addresses, HTTP user agents, ssh versions

• Too “clean”: no checksum errors, fragmentation, garbage data in reserved fields, malformed commands

Page 21: Machine Learning for Network Anomaly Detection Matt Mahoney

Real Traffic is Less Predictable

r (Number ofvalues)

Time

Synthetic

Real

Page 22: Machine Learning for Network Anomaly Detection Matt Mahoney

Mixed Traffic: Fewer Detections, but More are Legitimate

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD

TotalLegitimate

Page 23: Machine Learning for Network Anomaly Detection Matt Mahoney

Project Status

• Philip K. Chan – Project Leader• Gaurav Tandon – Applying LERAD to

system call arguments• Rachna Vargiya – Application payload

tokenization• Mohammad Arshad – Network traffic

outlier analysis by clustering

Page 24: Machine Learning for Network Anomaly Detection Matt Mahoney

Further Reading

• Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks by Matthew V. Mahoney and Philip K. Chan, Proc. KDD.

• Network Traffic Anomaly Detection Based on Packet Bytes by Matthew V. Mahoney, Proc. ACM-SAC.

• http://cs.fit.edu/~mmahoney/dist/