26
On the Utility of Anonymized Flow Traces for Anomaly Detection Author : Martin BURKHART , Daniela BRAUCKHOFF†, Martin MAY‡ Journal: ITC SS 2008 Advisor: Yuh-Jye Lee Reporter: Yi-Hsiang Yang Email: [email protected] 2011/2/ 14

On the Utility of Anonymized Flow Traces for Anomaly Detection

  • Upload
    kalb

  • View
    17

  • Download
    0

Embed Size (px)

DESCRIPTION

On the Utility of Anonymized Flow Traces for Anomaly Detection. Author : Martin BURKHART∗, Daniela BRAUCKHOFF†, Martin MAY‡ Journal: ITC SS 2008 Advisor: Yuh-Jye Lee Reporter: Yi-Hsiang Yang Email: [email protected]. Contributions. - PowerPoint PPT Presentation

Citation preview

Page 1: On the Utility of Anonymized Flow Traces for Anomaly Detection

On the Utility of Anonymized Flow Traces for Anomaly Detection

Author : Martin BURKHART , ∗ Daniela BRAUCKHOFF†, Martin MAY‡Journal: ITC SS 2008Advisor: Yuh-Jye LeeReporter: Yi-Hsiang YangEmail: [email protected]

2011/2/14

Page 2: On the Utility of Anonymized Flow Traces for Anomaly Detection

Contributions

• Introduce a generic methodology for evaluating the impact of anonymization•Quantify the utility of anonymized data for a

three-week long data•Present an overall estimate for the impact of

anonymization

2

Page 3: On the Utility of Anonymized Flow Traces for Anomaly Detection

Outline

•Introduction•Methodology•Measurement Results•Conclusion

3

Page 4: On the Utility of Anonymized Flow Traces for Anomaly Detection

Introduction

• Traffic data is hinderedReleasing data introduces a threat to users’

privacyAnomaly detection

Have been evaluated with anonymized data•Focus on the anonymization of IP addresses

BlackmarkingTruncationRandom Permutation(Partial) Prefix-Preserving permutation

4

Page 5: On the Utility of Anonymized Flow Traces for Anomaly Detection

Utility of Anonymized Data for Anomaly Detection

• Granularity design space has two dimensionsSubset size

The size of the network (subnet) that is to be analyzed

ResolutionThe address granularity which the traffic is

analyzed• Assume the whole design space is available

5

Page 6: On the Utility of Anonymized Flow Traces for Anomaly Detection

• Cell 1 [00,00]: Select all traffic and set the resolution to the minimum. • Cell 5 [00,16]: Select all traffic and set the resolution to /16 networks.

6

Page 7: On the Utility of Anonymized Flow Traces for Anomaly Detection

IP address anonymization techniques

7

•Blackmarking (BM)Blindly replaces all IP addresses in a trace with

the same value•Truncation (TR{t})

Replaces the t least significant bits of an IP address with 0

•Random permutation (RP)Translates IP addresses using a random

permutation Partial prefix-preserving permutation (PPP{p})

Permutes the host and network part of IP addresses independently

Page 8: On the Utility of Anonymized Flow Traces for Anomaly Detection

IP address anonymization techniques

•Prefix-preserving permutation (PP)Permutes IP addresses so that two addresses

sharing a common real prefix

8

Page 9: On the Utility of Anonymized Flow Traces for Anomaly Detection
Page 10: On the Utility of Anonymized Flow Traces for Anomaly Detection

Methodology•Data captured from the four border routers

of the Swiss Academic and Research NetworkIP address range contains about 2.4 million

IP addresses Traffic volume varies between 60 and 140

million NetFlow records per hourAnalyzed a three-week period (from August 19th

to September 10th 2007) 713 TerabytesUn-sampled and Non-anonymized flow data

10

Page 11: On the Utility of Anonymized Flow Traces for Anomaly Detection

Methodology-Ground Truth

•Visual inspection of metric timeseriesComputed the timeseries for five well-known

metrics byte, packet, flow counts, unique IP address counts,

and the Shannon entropy¶ of flows per IP address

At 15-minute intervals2016 data points per metric

11

Page 12: On the Utility of Anonymized Flow Traces for Anomaly Detection

Methodology-Ground Truth•Assigning ground truth to each interval

If the analyzed metric timeseries exposed an unusual event, classified that interval as anomalous

• Identifying the anomaly typeAssigned the anomalous events to different types

Volume A sharp increase or decrease in the volume based

metrics (D)DoS

Drop in the destination IP address entropy

12

Page 13: On the Utility of Anonymized Flow Traces for Anomaly Detection

Methodology-Ground Truth Scan

Increase in the destination IP address count and entropy

Network Fluctuation Cause an increase or decrease in the IP address

counts at the highest resolution Unknown

13

Page 14: On the Utility of Anonymized Flow Traces for Anomaly Detection

Methodology-Anomaly Detection•Use Kalman filter

Efficient recursive filter

14

Page 15: On the Utility of Anonymized Flow Traces for Anomaly Detection

Methodology

•60 studied metrics are different variants ofThree volume-based metrics (vbm)

Byte, packet and flow countsTwo feature-based metrics (fbm)

Unique IP address count Shannon entropy of flows per IP address

•Total (3[vbm] + (2[fbm] × 2[src/dst] × 3[res])) × 2[in/out] × 2[udp/tcp] = 60 detection metrics

15

Page 16: On the Utility of Anonymized Flow Traces for Anomaly Detection

Methodology

16

Page 17: On the Utility of Anonymized Flow Traces for Anomaly Detection

Measurement Results

17

Page 18: On the Utility of Anonymized Flow Traces for Anomaly Detection

Measurement Results•Volume Anomalies

Exposed by volume-based metricsFor TCP blackmarking and random permutation

perform slightly better

18

Page 19: On the Utility of Anonymized Flow Traces for Anomaly Detection

Measurement Results•Scanning and denial of service anomalies

Feature-based metrics

19

Page 20: On the Utility of Anonymized Flow Traces for Anomaly Detection

Measurement Results

•Network fluctuationsFeature-based metrics at lower resolutions

20

Page 21: On the Utility of Anonymized Flow Traces for Anomaly Detection

Measurement Results-AUC

21

Page 22: On the Utility of Anonymized Flow Traces for Anomaly Detection

Measurement Results

•Blackmarking Decreases the utility for detecting anomalies in

UDP and TCP traffic except volume anomalies

•Random permutation Very bad with the detection of anomalies in UDP

trafficPreserving the utility for TCP traffic

22

Page 23: On the Utility of Anonymized Flow Traces for Anomaly Detection

Measurement Results

•Truncation of 8 or 16 bitDecreases the utility for detecting anomalies in

TCP traffic by roughly10 percentPerforming well for UDP traffic

•(Partial) prefix-preserving permutationNo significant negative impact for detecting

anomalies in UDP and TCP traffic

23

Page 24: On the Utility of Anonymized Flow Traces for Anomaly Detection

Implicit Traffic Aggregation

•Analyzing the count of additional flows for 170 webserversTruncating a single bit

Around 10% of the webservers have a resulting traffic increase of 100% or more and 50% no additional traffic

Unaffected servers : 20% for 2 bits, 5% for 4 bits, and even 0% for 8 bits

25% for 2 bits, 55% for 4 bits and 89% for 8 bits at least a doubling of traffic

24

Page 25: On the Utility of Anonymized Flow Traces for Anomaly Detection

Conclusion

•Anonymization techniques impact statistical anomaly detection

• Introduced the detection granularity design space

•Analyzed the utility of anonymized traces

25

Page 26: On the Utility of Anonymized Flow Traces for Anomaly Detection

Thanks for your attentionQ&A

26