Upload
gwyn
View
74
Download
0
Tags:
Embed Size (px)
DESCRIPTION
LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams. Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14. Motivation. Network traffic: a stream of (key, value) tuples - PowerPoint PPT Presentation
Citation preview
1
LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams
Qun Huang and Patrick P. C. Lee
The Chinese University of Hong Kong, Hong Kong
INFOCOM’14
Network traffic: a stream of (key, value) tuples• Keys: src IPs, five-tuple flows• Value: # of packets, payload bytes
Heavy keys - classical anomalies in network traffic• Heavy hitters: keys with large volume in one period
• e.g. SLA violation• Heavy changers: keys with large volume change across two
periods• e.g. DoS attacks, component failures
Goal:• identify heavy keys in real time
Motivation
2
Challenges Enormous key space
• e.g., 5-tuple IPv4 flows are drawn from key domain of size • Per-key tracking is infeasible
Line-rate processing• Single machine fails to keep pace with line rate
Seamless distributed detection• Apply single-machine detection in distributed architecture• Open issue:
• How to achieve both scalability and accuracy ?
3
Related Works Counter-based techniques
• Misra-Gries algorithm [Misra & Gries 82]; Lossy Counting [Manku et al. 02]; Space Saving [Metwally et al. 05]; Probalistic Lossy Count [Dimitropoulos et al. 08]
• Only address for heavy hitter detection in single machine
Sketch-based techniques• Multi-stage filter [Estan et al. 03]; CGT [Cormode et al. 04]; Reversible
Sketch [Schweller et al. 06]; SeqHash [Tian et al. 07]; Fast Sketch [Liu et al. 12]
• Only work in single machine
Distributed detection• [Cormode et al. 2005]• [Manjhi et al. 2005]• [Yi et al. 2009]• Only address heavy hitter detection
4
Our Work
5
LD-Sketch: a new sketching design for heavy key detection in a distributed architecture
A sketch technique for local detection• High accuracy• High speed• Low space complexity
A distributed detection scheme not only achieves scalability but also improves accuracy
Experiments on real-world traces
Problem Formulation Perform detection in each time period (epoch) Input data: a stream (key, value) tuple True sum :
• sum of values of key in the time period
True change :• absolute value of difference of in current and last epochs
Heavy hitters: all with Heavy changers: all with Problem: infeasible to track and in real-time with
limited memory6
Architecture
7
Remotesite
Remotesite
Remotesite
Remotesite
Remotesite
Datasource
Datasource
Datasource
Datasource
Datasource
WorkerWorkerWorkerLocal
detectionLocal
detectionLocal
detection
Local detection
results Final detection results
Distributed detection
Local Detection
For each data item • select a bucket for row by
hashing key with function • update the bucket with the
data item
8
Update phase
Examine the buckets and report heavy keys
Detection phase
key rows
buckets
h1
h2
h𝑟
Structure of rows, with buckets each
LD-Sketch
Inside a Bucket
9
Bucket
length:
𝑘𝑒𝑦 1𝑣𝑎𝑙𝑢𝑒𝑘𝑒𝑦 2𝑣𝑎𝑙𝑢𝑒
𝑒𝑚𝑝𝑡𝑦…
Array𝑉 𝑖 , 𝑗
Total sum:
𝑒𝑖 , 𝑗
Error:
Expansion parameter
Basic ideas• Track significant keys in a bucket with array • Increment length based of total sum and parameter • Record error due to dropping insignificant keys
Update Bucket with
10
Case 1: • Update directly:
Case 2: but has empty slots• Insert key into , and set
Cases 3 & 4: , is full• Expansion number • Based on and :
• Case 3: decrement keys in • Case 4: expand dynamically
Four cases
Case 3: Example
• Bucket
• New data item
Procedure• Step 1: calculate decrement value
Decrement Keys
11
y 5𝐴𝑖 , 𝑗
𝑙𝑖 , 𝑗=1 𝑒𝑖 , 𝑗=2
�̂�={3 ,𝑖𝑓 𝑣 𝑥=35 ,𝑖𝑓 𝑣 𝑥=55 , 𝑖𝑓 𝑣𝑥=8
Step 1
Procedure (cont.)• Step 2: Update • Step 3: Update
• , for all • Remove all with • Insert key with if
Decrement Keys
12
emptyAfter
𝑣 𝑥=3
x 3After
y 2After
y 5Before 𝑣 𝑥=5
𝑣 𝑥=8
𝑒𝑖 , 𝑗={5 ,𝑖𝑓 𝑣𝑥=37 , 𝑖𝑓 𝑣𝑥=57 ,𝑖𝑓 𝑣𝑥=8
Step 3
Step 2
Case 4: • Add new counters to • Set • Insert key with
Dynamic Expansion
13
𝑙𝑖 , 𝑗=5
𝑦 1𝐴𝑖 , 𝑗Before 𝑦 2𝑦 3
𝑙𝑖 , 𝑗=11
𝐴𝑖 , 𝑗After 𝑥
𝑦 4𝑦 5
𝑦 3𝑦 4𝑦 5𝑦 1𝑦 2
Estimate True Sum or Change Estimate in bucket : a pair of values
Estimate in bucket
• Estimate change:
14
Bucket at 1st epoch
and
Bucket at 2nd epoch
and
Identify Heavy Key
15
Bucket
Key point: consider keys tracked by buckets Enumerate all buckets
𝑉 𝑖 , 𝑗≥𝜙, check key
Check key for heavy hitters• for all row
Check key for heavy changers• for all row
Analysis
16
Let maximum number of heavy keys = On accuracy
• Zero false negative rate• Upper bound of false positive rate
On complexity• time complexity to update a data item: • time complexity to identify heavy keys: • space complexity:
Distributed Detection
17
Remotesite
WorkerLocal
detection
Local detection results
Final detection results
Goal• Scalability: reduce
complexity• Accuracy: reduce
false positive rate
Remote Site• How to partition data
streams
Final results• How to aggregate
local detection results
Remote Sites Two-step partitioning
For same , the same workers are selected in all remote sites
18
Data item
Worker Worker Worker WorkerWorker
Step 1: select workers based on
Step 2: select one from the workers uniformly
Worker Worker Worker
Detection and Aggregation Detection in workers
• For key , each selected worker expects to receive of • Perform local detection in each worker with threshold
Aggregate results
19
All workers report in the local detection
For key
Report as a heavy key
Analysis
20
Let• Maximum number of heavy keys = • Total number of worker =
On accuracy• Reduce false positive rate• Introduce a small false negative rate due to unfair
partitioning
On complexity• time complexity to update a data item: • time complexity to identify heavy keys: • space complexity:
Experimental Results Trace
• 3G UMTS network in mainland China in December 2010• 1.1 billion packets, 600GB traffic
Approach• Local detection: compare LD-Sketch with CGT, SeqHash, Fast
Sketch, all of which are allocated same amount of memory• Distributed detection: vary the value of
Metrics• Recall:
• (# of returned true heavy keys) / (# of true heavy keys)• Precision:
• (# of returned true heavy keys) / (# of return keys)• Update throughput
21
Accuracy of Local Detection: Heavy Changer
22
LD-Sketch achieves 100% recall LD-Sketch has a little lower precision than CGT and
Seqhash, but we can improve with distributed detection
Accuracy of Distributed Detection: Heavy Changer
23
When , the precision is similar to local detection When , the precision significantly increases while lose a
little recall
Throughput
24
LD-Sketch has a little lower throughput than CGT and Fast Sketch in local detection
LD-Sketch can scale linearly in distributed detection
Local detection Distributed detection
Conclusions
25
Propose LD-Sketch, a sketching approach for real-time heavy key detection in a distributed architecture• Composed of local detection and distributed detection
Propose a sketch structure for local detection• High accuracy• Low complexity in space and time• Seamlessly deployed in distributed architecture
Propose a distributed detection scheme• Reduce complexity• Improve accuracy