SNMP & Flow Introduction

Embed Size (px)

Citation preview

  • 8/3/2019 SNMP & Flow Introduction

    1/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Compact data structures: data streaming

    Luca Becchetti

    Sapienza Universita di Roma Rome, Italy

    April 26, 2008

    http://www.dis.uniroma1.it/~becchet/http://www.uniroma1.it/http://www.uniroma1.it/http://www.dis.uniroma1.it/~becchet/
  • 8/3/2019 SNMP & Flow Introduction

    2/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    1 What is streaming?

    2 Count-Min sketch

    3 Inner product and join

    4 Heavy hitters

  • 8/3/2019 SNMP & Flow Introduction

    3/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Informally...

    Streaming involves ([Muthukrishnan, 2005a]):

    Small number of passes over data. (Typically 1?)

    Sublinear space (sublinear in the universe or number ofstream items?)

    A model of computation...

    Similar to dynamic, online, approximation or randomizedalgorithms, but with more constraints

    Constraints impose limitations that make many easyproblems hard (further in this lecture)

    Being poly-time/poly-space no longer sufficient

  • 8/3/2019 SNMP & Flow Introduction

    4/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Motivations

    US Bbones [Odlyzko, 2003] [Ipoque GMBH, 2007]

    Traffic explosion in past years [Muthukrishnan, 2005a]

    30 billions emails, 1 billions SMS, IMs daily (2005) 1 billion ackets router x hr

  • 8/3/2019 SNMP & Flow Introduction

    5/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Motivations

    LAN

    LAN

    INTERNET

    Router

    SNMP logFlow log

    Packet log

    Logs

    SNMP: (Router ID, Interface ID, Timestamp, Bytes sent sincelast obs.)

    Flow: (Source IP, Dest IP, Start Time, Duration, No. Packets,No. Bytes)

    (Source IP, Dest IP, Src/Dest Port Numbers, Time, No. Bytes)

  • 8/3/2019 SNMP & Flow Introduction

    6/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Challenges [Muthukrishnan, 2005a]

    1 link with 2 Gb/s. Say avg packet size is 50 bytesNumber of pkts/sec = 5 Million

    Time per pkt = 0.2 sec (time available for processing)

    If we capture pkt headers per packet: src/dest IP, time,

    no of bytes, etc. at least 10 bytesSpace per second is 50 Mb. Space per day is 4.5 Tb perlink

    ISPs have hundreds of links.

    Focus is on solutions for real applications

    Note: we seek solutions that work in practice easy toimplement, require small space, allow fast updates and queries

  • 8/3/2019 SNMP & Flow Introduction

    7/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Some typical queries [Muthukrishnan, 2005a]

    How many distinct IP addresses use a given link currentlyor anytime during the day?

    What are the top k voluminous flows currently inprogress in a link?

    How many distinct flows were observed?Are traffic patterns in two routers correlated? What are(un)usual trends?

    Network monitoring just one possible application

    On-line statistics on search engines query logs

    On-line statistics on server logs

    ...

  • 8/3/2019 SNMP & Flow Introduction

    8/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    The streaming model [Muthukrishnan, 2005b]

    Processingunit

    Uj

    Uj+1

    Uj+2

    Uj+3

    Streaming model

    Data flowData item arrive over time

    Uj processed before Uj+1 arrives

    Only one pass (or few passes, at most O(log))

  • 8/3/2019 SNMP & Flow Introduction

    9/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    The streaming model [Muthukrishnan, 2005b]

    Underlying data (signal): an n-dimensional array A, n

    typically large (e.g., the size of the IP address space)Update arrive over time. The j-th update is a pairUj = (i, x), where i is an index:

    Ai = Ai + x

    In general, x can be any

    Initially: Ai = 0, i = 1, . . . n

    A(t): the state of the array after the first t updates

    Goal

    Compute and maintain functions over A in small space,with fast updates and computation

    typically, space

  • 8/3/2019 SNMP & Flow Introduction

    10/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Special cases of the model[Muthukrishnan, 2005b]

    j-th update changes A(j) (Time series model)

    Cash register: x 0

    Turnstile model: most general model

    In this lecture

    Compact summaries of data streams (Count-Minsketches [Cormode and Muthukrishnan, 2005])

    Applications: point queries, range sums, heavy hitters,quantiles

    Only a drop in a sea of results...

  • 8/3/2019 SNMP & Flow Introduction

    11/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    The CM sketch[Cormode and Muthukrishnan, 2005]

    In the sequel: cash-register model

    2-Dimensional array whose size is determined by designparameters and (their meaning explained further)

    Array is C[j, l], where j = 1, . . . , d and l = 1, . . . , w

    d =

    ln 1

    (depth)w =

    e

    (width)

    Every entry initially 0

    d hash functions h1, . . . , hd chosen uniformly at random

    from a pairwise-independent family (see first lecture)hr : {1, . . . , n} {1, . . . , w}

    Update

    Pair (i, c) is observed, meaning that, ideally, Ai = Ai + c

  • 8/3/2019 SNMP & Flow Introduction

    12/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    CM sketch: Update procedure

    w

    d

    hd

    h1

    i

    +c

    +c

    +c

    +c

    CM sketch update

    update(i, c)

    Require: i: array index, c: value

    1: for j : 1 . . . d do2: C[j, hj(i)] = C[j, hj(i)] + c3: end for

  • 8/3/2019 SNMP & Flow Introduction

    13/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Point query

    Basic query, building block for the others

    Q(i): estimate Ai

    Point query estimate

    PQ(i)Require: i: array index

    1: return Ai = minj C[j, hj(i)]

    Theorem ([Cormode and Muthukrishnan, 2005])

    Ai Ai. Furthermore, P

    Ai > Ai + A1

    , where

    A1 =n

    i=1 |Ai| is the 1-norm of A.

  • 8/3/2019 SNMP & Flow Introduction

    14/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Proof of theorem

    Define Iijk = 1 if (i = k)

    (hj(i) = hj(k)), 0 otherwise

    P[hj(i) = hj(k)] 1

    w =

    e by pairwise independenceDefine Xij =

    nk=1 IijkAk

    Xij 0 and C[i,j] = Ai + Xij Ai Ai

    Xij measures the error introduced by collisions

    E[Xij] = E[n

    k=1 IijkAk] =n

    k=1 AkE[Iijk] eA1

    Notice that the only random variables are the Iijks andthe Xijs

    Furthermore,

    P

    Ai > Ai + A1

    = P[j : C[j, hj(i)] > Ai + A1]

    = P[j : Ai + Xij > Ai + A1] = P[j : Xij > eE[Xij]]

    < ed ,

    where the fourth inequality follows from Markovs inequality.

  • 8/3/2019 SNMP & Flow Introduction

    15/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Join size

    4 2 2 3 4 2

    142142

    R.A

    S.A

    1 2 3 4

    1 2 3 4

    *

    *R.B

    R.C

    Join of two relations R and S on same attribute A

    Tuple

    We want to compute COUNT(R 1A S)

    Space (n) required (in the picture: n = 4)

    In the example: COUNT(R 1A S) = 3x2 + 2x2 = 10

    Underlying operation: compute scalar product of two

    non-negative vectors of same size

    S i

  • 8/3/2019 SNMP & Flow Introduction

    16/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Inner product

    Given two vectors A and B, we want to keep a compact

    summary of A B

    We use two CM sketches CA and CB for A and B

    Estimation

    Let Sj =w

    k=1 CA[j, k]CB[j, k]Estimation: return S minj Sj

    Theorem ([Cormode and Muthukrishnan, 2005])

    S A B. Furthermore:

    P[S > A B + A1B] .

    Q1: Prove theorem above. Proof follows same lines as forpoint query

    St i H hi

  • 8/3/2019 SNMP & Flow Introduction

    17/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Heavy hitters[Cormode and Muthukrishnan, 2005]

    -heavy hitters of A: {i : Ai > A1}

    No. -heavy hitters: between 0 and 1/

    Approximate heavy hitters: accept i such thatAi ( )A1 for some specified <

    We consider the cash-register model

    Heavy hitters algorithm: ingredients

    CM sketch and point query basic building blocks

    Return items whose estimate exceeds A1

    Assume cs is the s-th update A1 =t

    s=1 cs A1 can be easily maintained and updated in smallspace

    Streaming H hi d d

  • 8/3/2019 SNMP & Flow Introduction

    18/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Heavy hitters: update and query

    update(i, c, H) {Update CM sketch}

    Require: i: array index, c 0, H: heap1: Ai = PQ(i)

    2: if Ai > A1 then3: HeapUpdate(i, Ai, H) {Insert if i H}4: end if

    5:s = HeapMin(H)

    6: while PQ(s) A1 do7: HeapDelete(s)

    8: s = HeapMin(H)

    9: end while

    Heap and query

    Generic heap element: pair (i, Ai) ordered by Ai

    Heavy hitters: return all elements i in H such that

    Ai > A1

    Streaming H hi f

  • 8/3/2019 SNMP & Flow Introduction

    19/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Heavy hitters: performance

    Theorem ([Cormode and Muthukrishnan, 2005])

    Assume Inserts only (cash register model). With CM sketchesusing space O

    1

    log n

    and update time O

    log n

    per item:

    Every heavy hitter is output

    With probability at least 1 : i) no item whose real

    count is ( )A1 is output and ii) the number ofitems in the heap is O

    1

    Question

    Assume d =e

    and w =

    ln n

    and let T be the estimatedset of heavy hitters. Recall that Ai Ai. Assume that, afterany number t of insertions: Ai < ( )A1. Prove that

    P[i T] .

    Streaming S l ti

  • 8/3/2019 SNMP & Flow Introduction

    20/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Solution

    Since i T, we have Ai > A1 j : C[i, hj(i)] > A1.

    Hence:

    Ai < ( )A1 C[i, hj(i)] Ai > A1, j.

    This implies:

    P[i T] = P

    Ai > Ai + A1

    = P[j : C[i, hj(i)] > Ai + A1] < ed =

    n,

    where the third inequality follows from the general result seenfor PQ(i). Finally, there are at most n items, so:

    P[i : i T] .

    Streaming

  • 8/3/2019 SNMP & Flow Introduction

    21/22

    Streaming

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Cormode, G. and Muthukrishnan, S. (2005).

    An improved data stream summary: the count-min sketch andits applications.

    J. Algorithms, 55(1):5875.

    Ipoque GMBH, G. (2007).

    Internet study 2007. URL: http://www.ipoque.com/.

    Muthukrishnan, S. (2005a).Data stream algorithms. URL:http://www.cs.rutgers.edu/muthu/str05.html.

    Muthukrishnan, S. (2005b).

    Data streams: Algorithms and applications.In Foundations and Trends in Theoretical Computer Science,Now Publishers or World Scientific, volume 1.

    Draft at authors homepage:http://www.cs.rutgers.edu/muthu/stream-1-1.ps.

    Streaming

  • 8/3/2019 SNMP & Flow Introduction

    22/22

    S g

    L. Becchetti

    What isstreaming?

    Count-Minsketch

    Inner productand join

    Heavy hitters

    Odlyzko, A. M. (2003).

    Internet traffic growth: sources and implications.

    In Proc. of SPIE conference on Optical Transmission Systemsand Equipment for WDM Networking, pages 115.