34
1 Network-based Intrusion Detection, Mitigation and Forensics System Yan Chen Department of Electrical Engineering and Computer Science Northwestern University Lab for Internet & Security Technology (LIST) http://list.cs.northwestern.edu

Network-based Intrusion Detection, Mitigation and Forensics System

Embed Size (px)

DESCRIPTION

Network-based Intrusion Detection, Mitigation and Forensics System. Yan Chen Department of Electrical Engineering and Computer Science Northwestern University Lab for Internet & Security Technology (LIST) http://list.cs.northwestern.edu. The Spread of Sapphire/Slammer Worms. - PowerPoint PPT Presentation

Citation preview

Page 1: Network-based Intrusion Detection, Mitigation and Forensics System

1

Network-based Intrusion Detection, Mitigation and

Forensics SystemYan Chen

Department of Electrical Engineering and Computer Science

Northwestern University

Lab for Internet & Security Technology (LIST)

http://list.cs.northwestern.edu

Page 2: Network-based Intrusion Detection, Mitigation and Forensics System

2

The Spread of Sapphire/Slammer Worms

Page 3: Network-based Intrusion Detection, Mitigation and Forensics System

3

Current Intrusion Detection Systems (IDS)

• Mostly host-based and not scalable to high-speed networks– Slammer worm infected 75,000 machines in <10

mins– Host-based schemes inefficient and user dependent

• Have to install IDS on all user machines !• Mostly simple signature-based

– Cannot recognize unknown anomalies/intrusions– New viruses/worms, polymorphism

Page 4: Network-based Intrusion Detection, Mitigation and Forensics System

4

Current Intrusion Detection Systems (II)

• Statistical detection – Unscalable for flow-level detection

• IDS vulnerable to DoS attacks

– Overall traffic based: inaccurate, high false positives

• Cannot differentiate malicious events with unintentional anomalies– Anomalies can be caused by network

element faults– E.g., router misconfiguration, link failures, etc.

Page 5: Network-based Intrusion Detection, Mitigation and Forensics System

5

Network-based Intrusion Detection, Mitigation, and Forensics System

• Online traffic recording [SIGCOMM IMC 2004, INFOCOM 2006, ToN to appear]– Reversible sketch for data streaming computation– Record millions of flows (GB traffic) in a few hundred KB– Small # of memory access per packet– Scalable to large key space size (232 or 264)

• Online sketch-based flow-level anomaly detection[IEEE ICDCS 2006] [IEEE CG&A, Security Visualization 06]– Adaptively learn the traffic pattern changes – As a first step, detect TCP SYN flooding, horizontal and

vertical scans even when mixed• Online stealthy spreader (botnet scan) detection

[IWQoS 2007]

Page 6: Network-based Intrusion Detection, Mitigation and Forensics System

6

Network-based Intrusion Detection, Mitigation, and Forensics System (II)

Integrated approach for false positive reduction• Polymorphic worm signature generation & detection

[IEEE Symposium on Security and Privacy 2006]

[IEEE ICNP 2007 to appear]

• Accurate network diagnostics [ACM SIGCOMM 2006] [IEEE INFOCOM 2007]

• Scalable distributed intrusion alert fusion w/ DHT[SIGCOMM Workshop on Large Scale Attack Defense 2006]

• Large-scale botnet event forensics using honeynet[work in progress]

Page 7: Network-based Intrusion Detection, Mitigation and Forensics System

7

System ArchitectureRemote aggregatedsketchrecords

Streaming packet data

Part IIPer-flowmonitoring & detection

Reversiblesketch monitoring

Filtering

Sketch based statistical anomaly detection (SSAD)

Local sketch records

Sent out for aggregation

Per-flow monitoring

Normal flows

Suspicious flows

Intrusion or anomaly alarms

Keys of suspicious flows

Keys of normal flows

Data path Control pathModules on the critical path

Signature-based detection

Polymorphic worm detection

Part ISketch-basedmonitoring & detection

Modules on the non-critical path

Network fault diagnosis

Page 8: Network-based Intrusion Detection, Mitigation and Forensics System

8

System Deployment• Attached to a router/switch as a black box• Edge network detection particularly powerful

Original configuration Monitor each port

separately

Monitor aggregated

traffic from all ports

Router

LAN

Internet

Switch

LAN

(a)

Router

LAN

Internet

LAN

(b)

HPNAIDMsystem

scan

po

rtsc

an

port

Splitter

Router

LAN

Internet

LAN

(c)

Splitter

HR

AID

syst

em

Switch

Switch

Switch

Switch

Switch

HPNAIDMsystem

HPNAIDMsystem

Page 9: Network-based Intrusion Detection, Mitigation and Forensics System

Detecting Stealthy Spreaders Using Online Outdegree Histograms

Yan Gao1, Yao zhao1, Robert Schweller1, Shobha Venkataraman2, Yan Chen1,

Dawn Song2 and Ming-Yang Kao1

1. Northwestern University

2. Carnegie Mellon University

Page 10: Network-based Intrusion Detection, Mitigation and Forensics System

10

Outline

• Motivation• Problem definition• System design• Evaluation• Conclusion

Page 11: Network-based Intrusion Detection, Mitigation and Forensics System

11

Motivation• High-speed network monitoring

– Small amount of memory usage– Small number of memory accesses per packet

• Superspreaders vs. Stealthy spreaders– Superspreaders: sources that connect a large

number of distinct destinations• e.g. a compromised host doing fast scanning for worm

propagation

– Stealthy spreaders: a number of sources that send more than a certain number of connections (unsuccessful) to distinct destinations

• e.g. botnet scans or moderate worm propagation

Page 12: Network-based Intrusion Detection, Mitigation and Forensics System

12

Existing Data Streaming Algorithms

• Online entropy estimation approachesChakrabarti et al. [STACS 06] and Guha et al. [ACM SODA 06]– Pros: detect unexpected changes in the network traffic– Cons: lose some concrete distribution information

• Online histogram estimation algorithms Gibbons et al. [VLDB 97] and Gilbert et al. [STOC 02]

– Pros: provide more information on the features of network traffic– Cons: cannot record the number of unique items

• Superspreader detection schemesVenkataraman et al. [NDSS 05] and Zhao et al. [IMC 05]– Pros: detect sources with an very large outdegree– Cons: memory usage unscalable to small/medium outdegrees

such as bot scansSuperspreader detection is a special case of spreader detection

n

ii

m

m

m

mH

1

)log(

Page 13: Network-based Intrusion Detection, Mitigation and Forensics System

13

Outline

• Motivation• Problem definition• System design• Evaluation• Conclusion

Page 14: Network-based Intrusion Detection, Mitigation and Forensics System

14

Problem Definitions

Two high-level problems

• Construct an approximation of the outdegree histogram online

• Directly detect the presence of stealthy spreaders without constructing the complete outdegree histogram

Page 15: Network-based Intrusion Detection, Mitigation and Forensics System

15

Problem Definition

• Input: stream of (Src, Dst) pairs S• Output

z --- of which powers define the buckets of the histogram (z=2)

…20 21 22 23 24 25 26 27 …

Histogram

Nu

mb

er

of s

ou

rce

s

Number of unique destinations

Page 16: Network-based Intrusion Detection, Mitigation and Forensics System

16

Problem Definition

• Input: stream of (SIP, DIP) pairs S• Output

Wi --- the set of sources

A source s is in Wi if and only if the number of unique destinations that s connects to is in the range of [zi, zi+1)

…20 21 22 23 24 25 26 27 …

Histogram

Nu

mb

er

of s

ou

rce

s

Number of unique destinations

Page 17: Network-based Intrusion Detection, Mitigation and Forensics System

17

Problem Definition

• Input: stream of (SIP, DIP) pairs S• Output

…20 21 22 23 24 25 26 27 …

Histogram

Nu

mb

er

of s

ou

rce

s

Number of unique destinations

mi = |Wi|

Creating an approximate histogram is to estimate mi for each bucket

Page 18: Network-based Intrusion Detection, Mitigation and Forensics System

18

Contribution• Study the problem of detecting stealthy

spreaders online– With constant small memory – With small memory accesses per packet

• Design the algorithm to detect stealthy spreaders online by approximating the outdegree histogram– Data recording phase

• Sampling and coupon collection-based algorithms

– Spreader detection phase• Linear regression to find bins where attacks happen

• Show that the change of approximated histogram reveals the presence of anomalies

Page 19: Network-based Intrusion Detection, Mitigation and Forensics System

19

Outline

• Motivation• Problem definition• System design• Evaluation• Conclusion

Page 20: Network-based Intrusion Detection, Mitigation and Forensics System

20

Recording Phase:

Sampling AlgorithmFast: update a smaller number of counters

per packet

S0 S1 S2 S3 Sd

(src, dst)

Packet

2-3≤ h(src) ≤ 2-2

src

src

src

Sampling algorithm

Page 21: Network-based Intrusion Detection, Mitigation and Forensics System

21

Recording Phase:

Coupon Collecting AlgorithmAccurate: create a better approximation

interim structure

S0 S1 S2 S3 Sd

(src, dst)

Packet

2-3≤ h(src) ≤ 2-2

(src

,g0(

dst)

)

(src

,g1(

dst)

)

(src

,g2(

dst)

)

(src

,g3(

dst)

)

(src

,gd(

dst)

)

Coupon collecting algorithm

(dst)gi : uniform random hash function for hashing dst to an integer in [1, 2i]

Page 22: Network-based Intrusion Detection, Mitigation and Forensics System

22

0m0,l ii

(2) di0for,)ε)(1ε'(1

ll

)ε)(1ε'(1

l

.z)z

1(1z)

z

1-(1][bB

,z)z

1(1z)

z

1-(1][aA

(1) where M,ALMB

i

ii

i

i

j

i

j-1

iij

1j

i

j

iij

• Outdegree histogram constructionInterim data structure -> final outdegree histogramUsing linear programming method

• Build a convex hull

Other constraints: • Find the lower and upper bounds for mi

• Solution– Directly use the interim data structure

Pros: Obtain a reasonably accurate histogram for normal network traffic

Cons: Fail to accurately estimate the outdegree histogram for anomalous traffic

Spreader Detection Phase

Page 23: Network-based Intrusion Detection, Mitigation and Forensics System

23

System Design• Change detection

– The change of the interim data structure of two time intervals

• Stealthy spreader detectionki

’ > ch (threshold)

• System architecture

i2,i1,'i kkk

Current CDCReal

traffic stream Current

SDC

Current

UDCAttack

detectionOnline

histogramcomputing

Recording phase Detection phase

DetectionResults

Page 24: Network-based Intrusion Detection, Mitigation and Forensics System

24

Spreader Detection Phase• The real scan event

Number of distinct destination

Num

ber

of s

cann

ers

One Peak

Close to 0

Page 25: Network-based Intrusion Detection, Mitigation and Forensics System

25

Spreader Detection Phase• Linear regression for coupon collecting

algorithm– Mean squared error as the fitting metric

BucketExample of linear regression

Val

ue o

f cou

ntin

g

Page 26: Network-based Intrusion Detection, Mitigation and Forensics System

26

Outline

• Motivation• Problem definition• System design• Evaluation• Conclusion

Page 27: Network-based Intrusion Detection, Mitigation and Forensics System

27

Evaluation Methodology• Traffic traces

– OC-48 CAIDA data on Aug. 14th, 2002– The average packet rate: 191K/s– The average flow rate: 3.75K/s

• A real scanning event collected from one class B honeynet on Jan 7th, 2007– Port 23– 2.5 hours– 1,607 unique sources– 1,700,236 scan sessions

• Synthetic scanning traces

Page 28: Network-based Intrusion Detection, Mitigation and Forensics System

28

Simulation Results

• Synthetic stealthy scan

Estimate ratio

The estimate ratio of scan outdegree

Per

cent

age

of d

etec

tion

resu

lts

False negative: 17.8%The estimation error within 20%: 33.9%

False negative: 0The estimation error within 20%: 76.1%

Estimate ratio = outdegreespreaderReal

outdegreeEstimated

Attack intensity = trafficnormalofflowsTotal

flowsscanTotal

Page 29: Network-based Intrusion Detection, Mitigation and Forensics System

29

• Synthetic stealthy scan

Simulation Results

Estimate ratio

CDF of estimate ratio for spreader intensity estimation

Cum

ulat

ive

perc

enta

ge (

%)

35%

80%

Page 30: Network-based Intrusion Detection, Mitigation and Forensics System

30

Simulation Results

• Real stealthy scan

Number of distinct destinationThe histogram of outdegree of scanners collected in the honeynet

Num

ber

of s

cann

ers

Estimation: 90

Ground truth: 87

Page 31: Network-based Intrusion Detection, Mitigation and Forensics System

31

Simulation Results

• Real stealthy scan

Estimate ratio

CDF of estimate ratios of scan outdegree estimation

Cum

ulat

ive

perc

enta

ge (

%)

80%

Mix the 5-min data of a real scanning event with 5-min normal traffic of CAIDA data (distribution over 30 such intervals)

Page 32: Network-based Intrusion Detection, Mitigation and Forensics System

32

Online Performance• Memory consumption

– Our method: O(c log(m)) • Constant memory: 24×1KB = 24KB

– Superspreader: • When k is small, the memory usage is closer to the size of

the entire data stream N.

• Memory access per packet– Single memory access per packet for each distinct

counting structure– Speed up: processing in parallel or in pipeline

• Speed– 3.2GHz Pentium 4 computer– Recording: 200 seconds for each 5-min CAIDA data

interval– Detection: less than 0.1 second

)δ1

lnkN

O(

Page 33: Network-based Intrusion Detection, Mitigation and Forensics System

33

Conclusion

• Propose the stealthy spreader detection problem

• Design an online outdegree histogram based stealthy spreader detection algorithm– Propose two randomized algorithms for

recording phase– Propose the linear regression based

approach for stealthy spreader detection

Page 34: Network-based Intrusion Detection, Mitigation and Forensics System