View
8
Download
0
Category
Preview:
Citation preview
Firewalls and Intrusion Detection Systems
David Brumleydbrumley@cmu.eduCarnegie Mellon University
IDS and Firewall Goals
Expressiveness: What kinds of policies can we write?
Effectiveness: How well does it detect attacks while avoiding false positives?
Efficiency: How many resources does it take, and how quickly does it decide?
Ease of use: How much training is necessary? Can a non-security expert use it?
Security: Can the system itself be attacked?
Transparency: How intrusive is it to use?
2
Firewalls
Dimensions:
1. Host vs. Network
2. Stateless vs. Stateful
3. Network Layer
3
Firewall Goals
Provide defense in depth by:
1. Blocking attacks against hosts and services
2. Control traffic between zones of trust
4
Logical Viewpoint
5
Inside OutsideFirewall
For each message m, either:
• Allow with or without modification
• Block by dropping or sending rejection notice
• Queue
m
?
Placement
Host-based Firewall
Network-Based Firewall
6
Host Firewall Outside
Firewall OutsideHost B
Host C
Host A
Features:• Faithful to local
configuration• Travels with you
Features:• Protect whole network• Can make decisions on
all of traffic (traffic-based anomaly)
Parameters
Types of Firewalls
1. Packet Filtering
2. Stateful Inspection
3. Application proxy
Policies
1. Default allow
2. Default deny
7
Recall: Protocol Stack
8
Application(e.g., SSL)
Transport (e.g., TCP, UDP)
Network (e.g., IP)
Link Layer(e.g., ethernet)
Physical
Application message - data
TCP data TCP data TCP data
TCP Header
dataTCPIP
dataTCPIPETH ETH
Link (Ethernet)
Header
Link (Ethernet)
Trailer
IP Header
Stateless Firewall
Filter by packet header fields
1. IP Field(e.g., src, dst)
2. Protocol (e.g., TCP, UDP, ...)
3. Flags(e.g., SYN, ACK)
Application
Transport
Network
Link Layer
Firewall
Outside Inside
Example: only allow incoming DNS packets to nameserver A.A.A.A.
Allow UDP port 53 to A.A.A.ADeny UDP port 53 allFail-safe good
practice
e.g., ipchains in Linux 2.2
9
Need to keep state
10
Inside Outside
Listening
Store SNc, SNs
Wait
SNCrandC
ANC0Syn
SYN/ACK:SNSrandS
ANSSNC
Established
ACK: SNSNC+1ANSNS
Example: TCP Handshake
Firewall
Desired Policy: Every SYN/ACK must have been preceded
by a SYN
Stateful Inspection Firewall
Added state (plus obligation to manage)
– Timeouts
– Size of table
State
Application
Transport
Network
Link Layer
Outside Inside
e.g., iptables in Linux 2.4
11
Stateful More Expressive
12
Inside Outside
Listening
Store SNc, SNs
Wait
SNCrandC
ANC0Syn
SYN/ACK:SNSrandS
ANSSNC
Established
ACK: SNSNC+1ANSNS
Example: TCP Handshake
Firewall
Record SNc in table
Verify ANs in table
State Holding Attack
13
Firewall AttackerInside
SynSyn
Syn
...1. SynFlood
2. Exhaust Resources
3. Sneak Packet
Assume stateful TCP policy
Fragmentation
14
Octet 1 Octet 2 Octet 3 Octet 4
Ver IHL TOS Total Length
ID 0DF
MF
Frag ID
...
Data
Frag 1 Frag 2 Frag 3
IP Hdr DF=0 MF=1 ID=0 Frag 1
IP Hdr DF=0 MF=1 ID=n Frag 2
IP Hdr DF=1 MF=0 ID=2n Frag 3
say n bytes
DF : Don’t fragment (0 = OK, 1 = Don’t)MF: More fragements(0 = Last, 1 = More)Frag ID = Octet number
Reassembly
15
Data
Frag 1 Frag 2 Frag 3
IP Hdr DF=0 MF=1 ID=0 Frag 1
IP Hdr DF=0 MF=1 ID=n Frag 2
IP Hdr DF=1 MF=0 ID=2n Frag 3
Frag 1 Frag 2 Frag 3
0 Byte n Byte 2n
16
Octet 1 Octet 2 Octet 3 Octet 4
Source Port Destination Port
Sequence Number
....
...DF=
1MF=1 ID=0 ...
1234(src port)
80(dst port)
...Packet 1
Overlapping Fragment Attack
...DF=
1MF=1 ID=2 ... 22 ...Packet 2
1234 8022
Assume Firewall Policy: Incoming Port 80 (HTTP) Incoming Port 22 (SSH)
Bypass policyTCP Hdr(Data!)
Stateful Firewalls
Pros
• More expressive
Cons
• State-holding attack
• Mismatch between firewalls understanding of protocol and protected hosts
17
Application Firewall
Check protocol messages directly
Examples:
– SMTP virus scanner
– Proxies
– Application-level callbacks
18
State
Application
Transport
Network
Link Layer
Outside Inside
Firewall Placement
19
Demilitarized Zone (DMZ)
20
Inside OutsideFirewall
DMZ
WWW
NNTP
DNS
SMTP
Dual Firewall
21
Inside OutsideHubDMZ
InteriorFirewall
ExteriorFirewall
Design Utilities
Solsoft
Securify
22
References
Elizabeth D. Zwicky
Simon Cooper
D. Brent Chapman
William R Cheswick
Steven M Bellovin
Aviel D Rubin
23
Intrusion Detection and PrevetionSystems
24
Logical Viewpoint
25
Inside OutsideIDS/IPS
For each message m, either:
• Report m (IPS: drop or log)
• Allow m
• Queue
m
?
Overview
• Approach: Policy vs Anomaly
• Location: Network vs. Host
• Action: Detect vs. Prevent
26
Policy-Based IDS
Use pre-determined rules to detect attacks
Examples: Regular expressions (snort), Cryptographic hash (tripwire, snort)
27
Detect any fragments less than 256 bytesalert tcp any any -> any any (minfrag: 256; msg:
"Tiny fragments detected, possible hostile activity";)Detect IMAP buffer overflowalert tcp any any -> 192.168.1.0/24 143 (
content: "|90C8 C0FF FFFF|/bin/sh"; msg: "IMAP buffer overflow!”;)
Example Snort rules
Modeling System Calls [wagner&dean 2001]
28
Entry(f)Entry(g)
Exit(f)Exit(g)
open()
close()
exit()
getuid() geteuid()
f(int x) {
if(x){ getuid() } else{ geteuid();}
x++
}
g() {
fd = open("foo", O_RDONLY);
f(0); close(fd); f(1);
exit(0);
}
Execution inconsistent with automata indicates attack
Anomaly Detection
29
Distribution of “normal” events
IDS
New Event
Attack
Safe
Example: Working Sets
30
Alice
Days 1 to 300
reddit xkcd
slashdot
fark
working setof hosts
Alice
Day 300
outside working set
reddit xkcd
slashdot
fark18487
Anomaly Detection
Pros
• Does not require pre-determining policy (an “unknown” threat)
Cons
• Requires attacks are not strongly related to known traffic
• Learning distributions is hard
31
Automatically Inferring the Evolution of Malicious Activity on the Internet
David BrumleyCarnegie Mellon University
Shobha VenkataramanAT&T Research
Oliver SpatscheckAT&T Research
Subhabrata SenAT&T Research
<ip1,+> <ip2,+> <ip3,+> <ip4,->
33
Tier 1
E K
A
...
Spam Haven
Labeled IP’s from spam assassin, IDS logs, etc.
Evil is constantly on the move
Goal:Characterize regions
changing from bad to good (Δ-good) or
good to bad (Δ-bad)
Research Questions
Given a sequence of labeled IP’s
1. Can we identify the specific regions on the Internet that have changed in malice?
2. Are there regions on the Internet that change their malicious activity more frequently than others?
34
35
Spam Haven
Tier 1
Tier 1
Tier 2
D
X
Tier 2
B C
K
Per-IP often not interesting
A
... DSL CORP
A
X
Challenges
1. Infer the right granularity
E
Previous work:Fixed granularity
Per-IPGranularity
(e.g., Spamcop)
36
Spam Haven
Tier 1
Tier 1
Tier 2
D E
Tier 2
B C
W
A
... DSL CORP
A
BGPgranularity
(e.g., Network-Aware clusters [KW’00])
Challenges
1. Infer the right granularity
XX
Previous work:Fixed granularity
B C
37
Spam Haven
Tier 1
Tier 1
Tier 2
D
X
Tier 2
B C
K
Coarse granularity
A
... DSL CORP
A
K
Challenges
1. Infer the right granularity
EE CORP
Idea:Infer granularity
Medium granularity
Well-managed network: fine
granularity
38
Spam Haven
Tier 1
Tier 1
Tier 2
D E
Tier 2
B C
W
A
... DSL SMTP
Challenges
1. Infer the right granularity
2. We need onlinealgorithms
A
fixed-memory device high-speed link
X
Research Questions
Given a sequence of labeled IP’s
1. Can we identify the specific regions on the Internet that have changed in malice?
2. Are there regions on the Internet that change their malicious activity more frequently than others?
39
Δ-Change
Δ-Motion
We Present
Background
1. IP Prefix trees
2. TrackIPTree Algorithm
40
41
Spam Haven
Tier 1
Tier 1
Tier 2
D E
Tier 2
B C
W
A
... DSL CORP
A
X
1.2.3.4/32
8.1.0.0/16
IP Prefixes:i/d denotes all IP addresses
i covered by first d bits
Ex: 8.1.0.0-8.1.255.255
Ex: 1 host (all bits)
42
OneHost
WholeNet0.0.0.0/0
0.0.0.0/1 128.0.0.0/1
128.0.0.0/2 192.0.0.0/20.0.0.0/264.0.0.0.0/
2
An IP prefix tree is formed by masking each bit of an IP address.0.0.0.0/32 0.0.0.1/32
0.0.0.0/31
128.0.0.0/3 160.0.0.0/3
128.0.0.0/4 152.0.0.0/4
43
0.0.0.0/0
0.0.0.0/1 128.0.0.0/1
0.0.0.0/264.0.0.0.0/
2128.0.0.0/2 192.0.0.0/2
A k-IPTree Classifier [VBSSS’09]
is an IP tree with at most k-leaves, each leaf labeled with
good (“+) or bad (“-”).
128.0.0.0/3 160.0.0.0/3
128.0.0.0/4 152.0.0.0/4
0.0.0.0/32 0.0.0.1/32
0.0.0.0/31
6-IPTree
+ -
+ -
+
+
Ex: 64.1.1.1 is bad
Ex: 1.1.1.1 is good
44
/1
/16
/17
/18+-
-In: stream of labeled IPs
... <ip4,+> <ip3,+> <ip2,+> <ip1,->
TrackIPTree Algorithm [VBSSS’09]
Out: k-IPTree
TrackIPTree
Δ-Change Algorithm
1. Approach
2. What doesn’t work
3. Intuition
4. Our algorithm
45
46
Goal: identify online the specific regions on the Internet that have changed in malice.
/0
/1
/16
/17
/18
T1 forepoch 1
+-
+
/0
/1
/16
/17
/18
T2 forepoch 2
++
-
Δ-Bad: A change from good to bad
Δ-Good: A change from bad to good
Δ-Good: A change from bad to good
Epoch 1 IP stream s1 Epoch 2 IP stream s2 ....
47
Goal: identify online the specific regions on the Internet that have changed in malice.
/0
/1
/16
/17
/18
T1 forepoch 1
+-
+
/0
/1
/16
/17
/18
T2 forepoch 2
++
-
False positive: Misreporting that a
change occurred
False Negative: Missing a real change
48
Goal: identify online the specific regions on the Internet that have changed in malice.
Idea: divide time into epochs and diff• Use TrackIPTree on labeled IP stream s1 to learn T1
• Use TrackIPTree on labeled IP stream s2 to learn T2
• Diff T1 and T2 to find Δ-Good and Δ-Bad
/0
/1
/16
/17
/18
T1 forepoch 1
+-
-
/0
/1
/16
T2 for epoch 2
-
Different Granularities!
49
Goal: identify online the specific regions on the Internet that have changed in malice.
Δ-Change Algorithm Main Idea: Use classification errors between Ti-1 and Ti
to infer Δ-Good and Δ-Bad
50
Ti-2 Ti-1 TiTrackIPTree TrackIPTree
Si-1 Si
Fixed
Ann. with class. error
Si-1
Told,i-1
Ann. with class. error
Si
Told,i
compare (weighted)
classificationerror
(note both based on same tree)
Δ-Good and Δ-Bad
Δ-Change Algorithm
Comparing (Weighted) Classification Error
51
/16IPs: 200Acc: 40%
IPs: 150Acc: 90%
IPs: 110Acc: 95%
Told,i-1
IPs: 40Acc: 80%
IPs: 50Acc: 30%
/16IPs: 170Acc: 13%
IPs: 100Acc: 10%
IPs: 80Acc: 5%
Told,i
IPs: 20Acc: 20%
IPs: 70Acc: 20%
Δ-Change Somewhere
Comparing (Weighted) Classification Error
52
/16IPs: 200Acc: 40%
IPs: 150Acc: 90%
IPs: 110Acc: 95%
Told,i-1
IPs: 40Acc: 80%
IPs: 50Acc: 30%
/16IPs: 170Acc: 13%
IPs: 100Acc: 10%
IPs: 80Acc: 5%
Told,i
IPs: 20Acc: 20%
IPs: 70Acc: 20%
Insufficient Change
Comparing (Weighted) Classification Error
53
/16IPs: 200Acc: 40%
IPs: 150Acc: 90%
IPs: 110Acc: 95%
Told,i-1
IPs: 40Acc: 80%
IPs: 50Acc: 30%
/16IPs: 170Acc: 13%
IPs: 100Acc: 10%
IPs: 80Acc: 5%
Told,i
IPs: 20Acc: 20%
IPs: 70Acc: 20%
Insufficient Traffic
Comparing (Weighted) Classification Error
54
/16IPs: 200Acc: 40%
IPs: 150Acc: 90%
IPs: 110Acc: 95%
Told,i-1
IPs: 40Acc: 80%
IPs: 50Acc: 30%
/16IPs: 170Acc: 13%
IPs: 100Acc: 10%
IPs: 80Acc: 5%
Told,i
IPs: 20Acc: 20%
IPs: 70Acc: 20%
Δ-Change Localized
Evaluation
1. What are the performance characteristics?
2. Are we better than previous work?
3. Do we find cool things?
55
56
Performance
In our experiments, we :
– let k=100,000 (k-IPTree size)
– processed 30-35 million IPs (one day’s traffic)
– using a 2.4 Ghz Processor
Identified Δ-Good and Δ-Bad in <22 min using <3MB memory
57
changes. Our experiments were run on a on a 2.4GHz
Sparc64-VI core. Our current (unoptimized) implementa-
tion takes 20-22 minutes to process a day’s trace (around
30-35 million IP addresses) and requires less than 2-3 MB
of memory storage.
We note that the ground truth in our data provides labels
for the individual IP addresses, but does not tell us the pre-
fixes that have changed. Thus, our ground truth allows us to
confirm that the learned IPTree has high accuracy, but we
cannot directly measure false positive rate and false nega-
tive rate of the change-detection algorithms. Thus, our ex-
perimental results instead demonstrate that our algorithm
can find small changes in prefix behaviour very early on real
data, and can do so substantially better than competing ap-
proaches. Our operators were previously unaware of most
of these ∆ -change prefixes, and as a consequence, our sum-
marization makes it easy for operators to both note changes
in behaviour of specific entities, as well as observe trends in
malicious activity. 7
4.1 Comparisons with Alternate ApproachesWe first compare ∆ -Change with previous approaches
and direct extensions to previous work. We compare two
different possible alternate approaches with ∆ -Change: (1)
using a fixed set of network-based prefixes (i.e., network-
aware clusters, see Sec. 2.2) instead of a customized IP-
Tree, (2) directly differencing the IPTrees instead of using
∆ -Change. We focus here on only spam data for space rea-
sons.
Network-aware Clusters. As we described in Sec-
tion 3.2, our change-detection approach has no false pos-
itives – every change we find will indeed be a change in
the input data stream. Thus, we only need to demonstrate
that ∆ -Change finds substantially more ∆ -changes than
network-aware clusters (i.e., has a lower false negative rate),
and therefore, is superior at summarizing changes in mali-
cious activity to the appropriate prefixes for operator atten-
tion.
We follow the methodology of [29] for labeling the
prefixes of the network-aware clusters optimally (i.e., we
choose the labeling that minimizes errors), so that we can
test the best possible performance of network-aware clus-
ters against ∆ -Change. We do this allowing the network-
aware clusters multiple passes over the IP addresses (even
though ∆ -Change is allowed only a single pass), as detailed
in [29]. We then use these clusters in place of the learned
IPTree in our change-detection algorithms.
We first compare ∆ -change prefixes identified by the
network-aware clustering and ∆ -Change. This compari-
son cannot be directly on the prefixes output by the two ap-
7As discussed in Section 1, our evaluation focuses exclusively on
changes in prefix behaviour, since prior work [28, 29] already finds per-
sistent malicious behaviour.
15 20 25 30 350
20
40
60
80
100
120
Interval in Days
No
. o
f D−
ch
an
ge
Pre
fixes
D−Change
Network−aware
(a) ∆ -change Prefixes
15 20 25 30 3510
3
104
105
106
Interval in Days
IPs
in D−
ch
an
ge
Pre
fix
es
D−Change
Network−aware
(b) IPs in ∆ -change prefixes
Figure 11. Comparing ∆ -Change algorithm with
network-aware clusters on the spam data: ∆ -Change
always finds more prefixes and covers more IPs
proaches, as slightly different prefixes may reflect the same
underlying change in the data stream, e.g., network-aware
clusters might identify a /24 while ∆ -Change identifies a
/25. In order to account for such differences, we group
together prefixes into distinct subtrees, and match a group
from the network-aware clustering to the appropriate group
from ∆ -Change if at least 50% of the volume of changed
IPs in network-aware clustering was accounted for in ∆ -
Change. In our results, network-aware clustering identified
no∆ -change prefixes that were not identified by ∆ -Change;
otherwise, we would have do the reverse matching as well.
Furthermore, this is what allows us to compare the num-
ber of ∆ -changes that were identified by both algorithms,
otherwise we would not be able to make this comparison.
Fig. 11(a) shows the results of our comparison for 37
days. Network-aware clustering typically finds only a small
fraction of the ∆ -change prefixes discovered by ∆ -Change,
ranging from 10% − 50%. On average, ∆ -Change finds
over 2.5 times as many∆ -change prefixes as network-aware
clusters. We compare also the number of IPs in ∆ -change
prefixes identified by the network-aware clustering and ∆ -
Change in Fig. 11(b). The ∆ -change prefixes discovered
by ∆ -Change typically account for a factor of 3-5× IP ad-
dresses as those discovered by the network-aware cluster-
ing. It indicates that network-aware clustering does not dis-
cover many changes that involve a substantial volume of the
input data. On many days, especially on days with changes,
the fraction of IP addresses not identified by network-aware
clusters, however, is still smaller than the fraction of pre-
fixes that it does not identify. This indicates that network-
aware clustering identifies the larger, coarser changes, but
misses the fine-grained changes.
Network-aware clusters perform so poorly because the
prefix granularity required to identify ∆ -changes typically
does not appear at all in routing tables. Indeed, as our anal-
ysis in Section 4.2 shows, a large number of ∆ -change pre-
fixes come from hosting providers, many of which do not
even appear in BGP prefix tables.
Possible Differencing of IPTrees. We now show that
the possible differencing approach described in Section 2.2
2.5x as many changes on average!
How do we compare to network-aware clusters?(By Prefix)
Spam
58
Grum botnet takedown
59
Botnets22.1 and 28.6 thousand new
DNSChanger bots appeared
38.6 thousand new Confickerand Sality bots
Caveats and Future Work
“For any distribution on which an ML algorithm works well, there is another on which is works poorly.”
– The “No Free Lunch” Theorem
60
Our algorithm is efficient and works well in practice.
....but a very powerful adversarycould fool it into having many false negatives. A formal characterizationis future work.
Detection Theory
Base Rate, fallacies, and detection systems
61
62
Let Ω be the set of all possible events. For example:
• Audit records produced on a host• Network packets seen
Ω
63
Ω
I
Set of intrusion events I
Intrusion Rate:
Example: IDS Received 1,000,000 packets. 20 of them corresponded to an intrusion.The intrusion rate Pr[I] is:Pr[I] = 20/1,000,000 = .00002
64
Ω
I A
Set of alerts A
Alert Rate:
Defn: Sound
65
Ω
I
A
Defn: Complete
66
Ω
I A
Defn: False PositiveDefn: False Negative
Defn: True Positive
Defn: True Negative
67
Ω
I A
Defn: Detection rate
Think of the detection rate as the set ofintrusions raising an alert normalized by the set of all intrusions.
70
Ω
I A
18 4
2
72
Ω
I A
Think of the Bayesian detection rate as the set of intrusions raising an alert normalized by the set of all alerts. (vs. detection ratewhich normalizes on intrusions.)
Defn: Bayesian Detection rateCrux of IDS usefulness!
74
Ω
I A2
4
18
About 18% of all alerts are false positives!
Challenge
We’re often given the detection rate and know the intrusion rate, and want to calculate the Bayesian detection rate
– 99% accurate medical test
– 99% accurate IDS
– 99% accurate test for deception
– ...
75
Fact:
76
Proof:
Calculating Bayesian Detection Rate
Fact:
So to calculate the Bayesian detection rate:
One way is to compute:
77
Example
• 1,000 people in the city
• 1 is a terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000
• Suppose we have a new terrorist facial recognition system that is 99% accurate.– 99/100 times when
someone is a terrorist there is an alarm
– For every 100 good guys, the alarm only goes off once.
• An alarm went off. Is the suspect really a terrorist?
78
City
(this times 10)
Example
Answer: The facial recognition system is 99% accurate. That means there is only a 1% chance the guy is not the terrorist.
79
(this times 10)
City
Formalization
• 1 is terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000. P[T] = 0.001
• 99/100 times when someone is a terrorist there is an alarm.P[A|T] = .99
• For every 100 good guys, the alarm only goes off once.P[A | not T] = .01
• Want to know P[T|A]
80
City
(this times 10)
• 1 is terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000. P[T] = 0.001
• 99/100 times when someone is a terrorist there is an alarm.P[A|T] = .99
• For every 100 good guys, the alarm only goes off once.P[A | not T] = .01
• Want to know P[T|A]
81
City
(this times 10)
Intuition: Given 999 good guys, we have 999*.01 ≈ 9-10 false alarms
False alarms
Guesses?
82
Unknown
Unknown
Recall to get Pr[A]
Fact:
83
Proof:
..and to get Pr[A∩ I]
Fact:
85
Proof:
86
✓
✓
87
For IDS
Let – I be an intrusion,
A an alert from the IDS
– 1,000,000 msgsper day processed
– 2 attacks per day
– 10 attacks per message
88
False positives
False positives
True positives
70% detection requires
FP < 1/100,000
80% detection generates 40% FP
From Axelsson, RAID 99
Conclusion
• Firewalls
– 3 types: Packet filtering, Stateful, and Application
– Placement and DMZ
• IDS
– Anomaly vs. policy-based detection
• Detection theory
– Base rate fallacy
89
90
Questions?
END
92
Thought
Overview
• Approach: Policy vs Anomaly
• Location: Network vs. Host
• Action: Detect vs. Prevent
93
Type Example
Host, Rule, IDS Tripwire
Host, Rule, IPS Personal Firewall
Net, Rule, IDS Snort
Net, Rule, IPS Network firewall
Host, Anomaly, IDS System call monitoring
Host, Anomaly, IPS NX Bit?
Net, Anomaly, IDS Working set of connections
Net, Anomaly, IPS
Recommended