30
SCREAM: Sket ch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)

SCREAM: Sketch Resource Allocation for Software-defined ...conferences2.sigcomm.org/co-next/2015/img/papers/scream.pdf · SCREAM: Sketch Resource Allocation for Software-defined Measurement

Embed Size (px)

Citation preview

SCREAM: Sketch Resource Allocation for Software-defined Measurement

Masoud Moshref, Minlan Yu,

Ramesh Govindan, Amin Vahdat

(CoNEXT’15)

Measurement is Crucial for Network Management

2

AccountingAnomaly Detection

Traffic Engineering

Heavy Hitter detectionHeavy hitter detection (HH)

Change detectionSuper source detection (SSD)

DDoS detection

Anomaly Detection

Traffic Engineering

Network Management on multiple tenants:

Measurement tasks:

Heavy Hitter detectionHierarchical heavy hitter detection (HHH)

Need fine-grained visibility of network traffic

Controller

DREAM [SIGCOMM’14] / SCREAM [CoNEXT’15]

Software Defined Measurement

3

Switch ATask 1 counters

Task 2 counters

Switch BTask 1 counters

Task 2 counters

Collect

Configure

Task 2Task 1

Our Focus: Sketch-based Measurement

4

Summaries of streaming data to approximately answer specific queries E.g., Bitmap for counting unique items

OpenFlow CountersDREAM [SIGCOMM’14]

Sketches

Memory Expensive, power-hungry TCAM

Cheaper SRAM

Counters Volume counters Volume and Connection counters

Flows Selected prefixes All traffic all-the-time

SCREAM [CoNEXT’15]

Sketches use a cheaper memory and are more expressive

Sketch Example: Count-Min Sketch

5

(IP, 1 Kbytes)

h1(IP)

h2(IP)

h3(IP)

What is the traffic size of IP? = row with min collision = Min(3,5,2) = 2

d

At packet arrival:

Provable error bound given traffic properties (e.g., skew)Resource accuracy trade-off:

At query:

2+1=3

4+1=5

1+1=2

Challenges: Limited Counters for Many Tasks

6

Many task instances:• 3 types (Heavy hitter, Hierarchical heavy hitter, Super source)• Different flow aggregates (Rack, App, Src/Dst/Port)• 1000s of tenants

Limited shared resources:• SRAM capacity (e.g., 128 MB)• Shared with other functions (e.g., routing)

Too many resources to guarantee accuracy:1 MB-32 MB per task• Less than 4-128 tasks in SRAM

Goal: Many Accurate Sketch-based Measurements

7

Users dynamically instantiate a variety of measurement tasks

SCREAM supports the largest number of measurement tasks while maintaining measurement accuracy

Approach: Dynamic Resource Allocation

8

Resource accuracy trade-off depends on traffic

Dynamic allocation for current traffic

Worst-case uses >10x counters than average

Count Min: Provable error bound given traffic propertiesEx: Skew of traffic from each IP

Skew

Req

uir

ed m

emo

ry

Opportunity: Temporal Multiplexing

9

Task 1

Task 2R

equ

ired

Mem

ory

Time

Multiplex memory among tasks over time

Memory requirement varies over time

Opportunity: Spatial Multiplexing

10

Req

uir

ed M

emo

ry

Switch A Switch B

Memory requirement varies across switches

Multiplex memory among tasks across switches

Task 1

Task 2

Key Insight

11

Leverage spatial and temporal multiplexing

and dynamically allocate switch memory per task

to achieve sufficient accuracy for many tasks

• DREAM has the same insight• SCREAM applies it for sketches

SCREAM Contributions

12

Heavy hitter (HH) tasks

Super Source(SSD) tasks

Dynamic resource allocator

Hierarchical heavy hitter (HHH) tasks

Allocation

1- Supports 3 sketch-based task types

2- Allocate memory among sketch-based task instances across switches while maintaining sufficient accuracy

SCREAM

• Anomaly detection• Traffic engineering• DDoS detection

SCREAM Iterative Workflow

13

Estimate accuracy

Allocate resources

Collect & report

Counters from many switches

Accuracy

Memory size

SCREAM Iterative Workflow

14

0 20 40 600

20

40

60

80

100

Time (s)

Task 1

Task 2

0 20 40 6010

20

30

40

50

Time (s)

Allo

ca

ted

Me

mo

ry (

KB

)

Task 1

Task 2

Task 1

Task 2

Task 1

Task 2

Task1 accuracy <80%

Give more memory to task1

Estimate accuracy

Allocate resources

Collect & report

Acc

ura

cy

SCREAM Iterative Workflow

15

0 20 40 600

20

40

60

80

100

Time (s)

Pre

cis

ion

Task 1

Task 2

0 20 40 6010

20

30

40

50

Time (s)

Allo

ca

ted

Me

mo

ry (

KB

)

Task 1

Task 2

Task 1

Task 2

Estimate accuracy

Allocate resources

Collect & report

Skew of traffic for task2 changesTask2 accuracy <80%

Give more memory to task2

Acc

ura

cyMerge counters from switches

SCREAM Challenges

Estimate accuracy

Allocate resources

Collect & report

Network-wide task implementation using sketches

Accuracy estimation without the ground-truth

Fast & Stable allocation in DREAM [SIGCOMM’14]

Switch BSwitch A

Challenge: Merge Sketches of Different Sizes

17

Network-wide Task

Heavy hitter (HH)

d d

w1 w2

Source IPs sending > 10Mbps

10 15

25

SCREAM Solution to Merge Sketches for HH Detection

18

10

30

70

40

50

20

10 40 30 50 70 20

50

10 4030 5070 20

30

+

Previous work: Min of sums SCREAM: Sum of mins

Min 10 20

Min Min50 80 90

+ ++

Switch BSwitch A10 15

25

Both over-approximate smaller is more accurate

SCREAM Solutions

Estimate accuracy

Allocate resources

Collect & report

Accuracy estimation without the ground-truth

• Merge sketches of different sizes for HH, HHH, SSD• SSD algorithm with higher and more stable accuracy

Network-wide task implementation using sketches

Fast & Stable allocation in DREAM [SIGCOMM’14]

Precision Estimation for Heavy Hitter Detection

20

Threshold

True HH False HH

Estimated

Real

Error Estimate-ThresholdEstimate-Threshold

= Sum(P[Detected HH is true])

= 1 - P[Error ≥ Estimate-Threshold]

True detected HHDetected HHs

Precision =

Insight: Relate probability to Error on counters of detected HHs

P[Detected HH is true]

Precision Estimation Step 1: Find a Bound on The Error

21

Idea 1: Use average Error in Markov’s inequality to bound it

Idea 1

= 1 - P[Error ≥ Estimate-Threshold]

Insight: Relate probability to Error on counters of detected HHs

P[Detected HH is true]

A row in Count-Min:

Precision Estimation Step 2: Improve The Bound

22

Insight:• Average Error = heavy items collision + small items collision• Counter indices of detected HHs show heavy collisions

Idea 2: Markov’s inequality only for small items

Idea 1Idea 2

SCREAM Solutions

Estimate accuracy

Allocate resources

Collect & report

Accuracy estimation without the ground-truth

• Merge sketches of different sizes for HH, HHH, SSD• SSD algorithm with higher and more stable accuracy

Network-wide task implementation using sketches

Precision estimators for HH, HHH and SSD tasks

Fast & Stable allocation in DREAM [SIGCOMM’14]

Evaluation

24

Metrics:

• Satisfaction of a task: Fraction of task’s lifetime with sufficient accuracy

• % of rejected tasks

Alternatives:• OpenSketch: Allocate for bounded error for worst-case

traffic at task instantiation (test with different bounds)• Oracle: Knows required resource for a task in each

switch in advance

Evaluation Setting

25

Simulation for 8 switches:• 256 task instances (HH, HHH, SSD, combination)• Accuracy bound = 80%• 5 min tasks arriving in 20 minutes• 2 hours CAIDA trace

128 256 384 5120

20

40

60

80

100

Switch capacity (KB)R

eje

cte

d ta

sks (

%)

OS_10

OS_50

OS_90

SCREAM

128 256 384 5120

20

40

60

80

100

Switch capacity (KB)

Ave

rag

e S

atis

factio

n

OS_10

OS_50

OS_90

SCREAM

SCREAM Provides High Accuracy for More Tasks

26

SCREAM: High satisfaction and low reject

OpenSketch:

Loose bound Under provision low satisfactionTight bound Over provision high reject

SCREAM’s Performance Is Close to An Oracle

27

128 256 384 5120

20

40

60

80

100

Switch capacity (KB)

Re

jecte

d ta

sks (

%)

Oracle

SCREAM

128 256 384 5120

20

40

60

80

100

Switch capacity (KB)

Ave

rag

e S

atis

factio

n

Oracle

SCREAM

SCREAM performance is close to an oracle, its satisfaction is a bit lower because:• Iterative allocation takes time• Accuracy estimation has error

Other Evaluations

28

SCREAM accuracy estimation has 5% error in averageAccuracy estimation error

Changing traffic skewSCREAM supports more accurate tasks than OpenSketch

Other accuracy metricsTasks in SCREAM have high recall (low false negative)

Conclusion

29

Practical sketch-based SDM by dynamic memory allocation• Implementing network-wide tasks using sketches• Estimating accuracy for 3 types of tasks

SCREAM is available at github.com/USC-NSL/SCREAM

Measurement is crucial for SDN managementin a resource-constrained environment

Thanks!Questions?

30