Download ppt - Machine Learning for Network Anomaly Detection Matt Mahoney

Machine Learning for Network Anomaly Detection

Matt Mahoney

Network Anomaly Detection

• Network – Monitors traffic to protect connected hosts

• Anomaly – Models normal behavior to detect novel attacks (some false alarms)

• Detection – Was there an attack?

Host Based Methods

• Virus Scanners• File System Integrity Checkers (Tripwire,

DERBI)• Audit Logs• System Call Monitoring – Self/Nonself

(Forrest)

Network Based Methods

• Firewalls• Signature Detection (SNORT, Bro)• Anomaly Detection (eBayes, NIDES,

ADAM, SPADE)

User Modeling

• Source address – unauthorized users of authenticated services (telnet, ssh, pop3, imap)

• Destination address – IP scans• Destination port – port scans

Frequency Based Models

• Used by SPADE, ADAM, NIDES, eBayes, etc.

• Anomaly score = 1/P(event)• Event probabilities estimated by counting

Attacks on Public Services

PHF – exploits a CGI script bug on older Apache web servers

GET /cgi-bin/phf?Qalias=x%0a/usr/bin/ypcat%20passwd

Buffer Overflows

• 1988 Morris Worm – fingerd• 2003 SQL Sapphire Wormchar buf[100];gets(buf);

buf stackExploit code

Return Address0 100

TCP/IP Denial of Service Attacks

• Teardrop – overlapping IP fragments• Ping of Death – IP fragments reassemble

to > 64K• Dosnuke – urgent data in NetBIOS packet• Land – identical source and destination

addresses

Protocol Modeling

• Attacks exploit bugs• Bugs are most common in the least tested

code• Most testing occurs after delivery• Therefore unusual data is more likely to be

hostile

Protocol Models

• PHAD, NETAD – Packet Headers (Ethernet, IP, TCP, UDP, ICMP)

• ALAD, LERAD – Client TCP application payloads (HTTP, SMTP, FTP, …)

Time Based Models

• Training and test phases• Values never seen in training are

suspicious• Score = t/p = tn/r where

– t = time since last anomaly– n = number of training examples– r = number of allowed values– p = r/n = fraction of values that are novel

Example tn/r

• Training: 0000111000 n/r = 10/2• Testing: 01223

– 0: no score– 1: no score– 2: tn/r = 6 x 10/2 = 30– 2: tn/r = 1 x 10/2 = 5– 3: tn/r = 1 x 10/2 = 5

PHAD – Fixed Rules

• 34 packet header fields– Ethernet (address, protocol)– IP (TOS, TTL, fragmentation, addresses)– TCP (options, flags, port numbers)– UDP (port numbers, checksum)– ICMP (type, code, checksum)

• Global model

LERAD – Learns conditional Rules

• Models inbound client TCP (addresses, ports, flags, 8 words in payload)

• Learns conditional rules

If port = 80 then word1 = GET, POST (n/r = 10000/2)

LERAD Rule Learning

• If word1 = GET then port = 80 (n/r = 2/1)• word1 = GET, HELO (n/r = 3/2)• If address = Marx then port = 80, 25 (n/r =

2/2)

Address Port Word1 Word2Hume 80 GET /Marx 80 GET /index.htmlMarx 25 HELO Pascal

LERAD Rule Learning

• Randomly pick rules based on matching attributes

• Select nonoverlapping rules with high n/r on a sample

• Train on full training set (new n/r)• Discard rules that discover novel values in

last 10% of training (known false alarms)

DARPA/Lincoln Labs Evaluation

• 1 week of attack-free training data• 2 weeks with 201 attacks

SunOS Solaris Linux NT

RouterInternet

SnifferAttacks

Attacks out of 201 Detected at 10 False Alarms per Day

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD

Problems with Synthetic Traffic

• Attributes are too predictable: TTL, TOS, TCP options, TCP window size, HTTP, SMTP command formatting

• Too few sources: Client addresses, HTTP user agents, ssh versions

• Too “clean”: no checksum errors, fragmentation, garbage data in reserved fields, malformed commands

Real Traffic is Less Predictable

r (Number ofvalues)

Time

Synthetic

Real

Mixed Traffic: Fewer Detections, but More are Legitimate

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD

TotalLegitimate

Project Status

• Philip K. Chan – Project Leader• Gaurav Tandon – Applying LERAD to

system call arguments• Rachna Vargiya – Application payload

tokenization• Mohammad Arshad – Network traffic

outlier analysis by clustering

Further Reading

• Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks by Matthew V. Mahoney and Philip K. Chan, Proc. KDD.

• Network Traffic Anomaly Detection Based on Packet Bytes by Matthew V. Mahoney, Proc. ACM-SAC.

• http://cs.fit.edu/~mmahoney/dist/

http://cs.fit.edu/~mmahoney/paper4.pdf