View
219
Download
2
Tags:
Embed Size (px)
Citation preview
1
Reversible Sketches for Efficient and Accurate Change Detection over
Network Data Streams
Robert Schweller Ashish GuptaElliot ParsonsYan Chen
Computer Science Department, Northwestern University
2
Online Change Detection• Network anomalies are common
– Flash crowds, failures, DoS, worms, …
Online Detection over Data Streams
• Data Stream: key/update pairs (k,u)
–Heavy hitters (lots of prior work)
–Heavy changes
3
-first to detect flow-level heavy changes in massive data streams at network traffic speeds.
k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]
1
j
H
0 1 K-1…
……
4
k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]
1
j
H
0 1 K-1…
……
hj(k)
hH(k)
h1(k)
Update (k, u): Tj [ hj(k)] += u (for all j)
Estimate v(S, k): sum of updates for key k
K
KsumkhT jjj /11
/)]([median
5
??
6
??
• Main problem– Cannot efficiently report keys with heavy change
• Our Contribution– Determine set of keys that have “large” estimates in sketch
• Requires very little space:–E.g. 5 hash tables with 16 K buckets = 80 KB–Fits in high speed memory
7
1
2
3
5
4
“Heavy”
Input:
Output: Set of keys that hash to heavy buckets in majority (or all) hash tables
-Sketch-Threshold
Reverse Sketch Problem
8
Outline
Streamingdatarecording
k-ary sketch
value
key
Heavychangedetection
k-ary sketch
heavychangekeys
changethreshold
fast
slow
Modularhashing
IP mangling
ReverseHashing
Algorithms
Improve Heavy Change Detection
9
• Intersect A1, A2, A3, A4, A5
Taking Intersections
H = 5 K = 212 #keys = 232 (IP addresses)
E[false positives] << 1
10
The problem with simple intersection• Why is this difficult ?
• Each set Ai can be very large !
H = 5 K = 212 #keys = 232 (IP addresses)
|A1| = 232 / 212 = 220
11
The problem with simple intersection• Why is this difficult ?
• Each set Ai can be very large !
• Solution:
Modular hashing
12
Modular hashing reduces the set size
32 bits
8 bits
10010100 10101011 10010101 10100011
010 110 001 101
h()
12 bits
13
Modular hashing reduces the set size
32 bits
8 bits
10010100 10101011 10010101 10100011
h1() h2() h3() h4()
010 110 001 101
010 110 001 101
Greatly reduces size of reverse mapped sets
14
Modular hashing reduces the set size
32 bits
8 bits
10010100 10101011 10010101 10100011
h1() h2() h3() h4()
010 110 001 101
010 110 001 101
Greatly reduces size of reverse mapped sets
28/23 = 25
15
1
2
3
5
4
b1
b2
b4
b5
b3
A1: 25 * 25 * 25 * 25
Modular hashing reduces the set size
Intersection:
Only 32 elements per partition
16
1
2
3
5
4
b1
b2
b4
b5
b3
A1: 25 * 25 * 25 * 25 A2: 25 * 25 * 25 * 25
Modular hashing reduces the set size
Intersection:
Only 32 elements per partition
17
1
2
3
5
4
b1
b2
b4
b5
b3b3
b1
b2
b4
b5
Handling Multiple Intersections…
2H different intersections
Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report )
18
Problem: Too many collisions
129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...
7 . 4 . 0 . *
32 bits 12 bits
19
Problem: Too many collisions
129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...
7 . 4 . 0 . *
32 bits 12 bits
IP Mangling
Solution:
20
IP-mangling
21
Invertible Modular Linear Equation
f(x) a·x mod n
To be invertible: Must be relatively prime
• a is odd, chosen randomly
22
Modular Hashing
Optimal Hashing
23
Modular Hashing
Modular Hashing with IP Mangling Optimal Hashing
24
Recap:
Streamingdatarecording
reversiblek-ary
sketch
value storedvalue
Modularhashing
IP manglingkey
Heavychangedetection
reversiblek-ary
sketch
Reversehashing
ReverseIP mangling
heavychangekeys
changethreshold
)( loglog/1 nn
)loglog
log(
n
n
25
Evaluation• Traffic traces from Northwestern University edge router
– Each 5 min interval average traffic 7.5 GB in each interval
• Compared with Ground Truth• 6 hash tables, 4K buckets each, totally 192KB memory• Up to 140 true heavy change keys in 1.5 seconds
– Over 95% TPP– Less than 2% FPP
• All missing changes are due to boundary effects
26
Conclusions/ Future Work
• Sketches: efficient summary structures • Our contribution: Reversible Sketches
– efficient online detection of keys with heavy changes
Work in Progress (see tech report)
• Improved reverse hashing• Statistical guarantee on detection accuracy• More advanced applications:
– Hierarchical change detection• E.g. 129.105.100.* shows a big change !
27
See tech report for more!
http://list.cs.northwestern.edu
Thank you !