RelSamp:Preserving Application Structurein Sampled Flow Measurements
Myungjin Lee, Mohammad Hajjat,Ramana Rao Kompella, Sanjay Rao
Internet
A plethora of Internet applications
Objectives Re-provision networks Detect undesirable behaviors of applications Prepare network better against major application
trends
2) Measure/Monitor
1) Emergence of new applications3) Characterization
Monitoring applications at an edge Goal: Monitoring application
behavior Identify number of flows Identify number of packets
Current Solution: Sampled NetFlow Supported by most modern routers
Key limitation: Application session structure gets distorted Small # of flows per application
session Small # of packets per application
session
EnterpriseNetwork
EdgeRouter
Internet
SampledNetFlow
Preserving application structure in flow measurements Benefit 1: Enables continuous monitoring of
applications Better understanding about communication patterns Better understanding of characteristics (# of flows,
packets)
Benefit 2: Application classification becomes easier Statistical machine learning techniques: SVM, C4.5, etc. Social behavior-based classifier: BLINC
Benefit 3: Detecting undesirable traffic patterns of an application
Contributions Introduce the notion of related sampling
Flows belonging to the same application session are sampled with higher probability
Propose RelSamp architecture for realizing related sampling Uses three stages of sampling to preserve application
structure
Show efficacy in preserving application structure Captures more number of flows per application session Significant increase of accuracy in application
classification
Related sampling
App2
App1
App3
Original applicatio
n structure
Sampled NetFlow
Related sampling
Key idea: Sample more flows from fewer application sessions
Realizing related sampling
Question 1: How to sample an application session ?
Question 2: How to sample packets within an application session ?
Defining application session
A sequence of packets from an application on a given host with inter-arrival time ≤ τ seconds Packets may belong to different flows to different
destinations
Example 1: BitTorrent connections to several destinations within a short span of time constitute an application session
Example 2: Web connections from a browser several seconds apart constitute different application sessions
Sampling an application session
One possible approach: Similar to Sampled NetFlow Sample packets with some probability Create an application session record if no record
exists Update the application session record
Problem: Hard to do in an online fashion No application session identifier (like flow key) Need to know all flows that constitute an
application session DPI-based techniques are both difficult and
incomplete
Our approach: sampling hosts Observation: Host is a super-set of an
application session Sample more flows from the same host
Flows originating at a same host closely in time typically belong to few application sessions About 80% hosts run fewer than 2 applications in
our study More details in the paper
RelSamp design Three-stage sampling process consisting of host,
flow, and packet selection stages Host stage: hash-based sampling
No state maintained on a per-application basis Many application sessions for a given host are possibly
sampled Change hash function periodically to track different hosts
Flow and packet stages: random packet sampling Controls fraction of flows sampled in an application
session and packets sampled in a flow Post processing: Can separate flow records into
application sessions using port-based/statistical classifiers
RelSamp architecture
Host-levelbias stage
Flow-levelbias stage
Pkt-levelbias stage
11
Copy
Ph
Selection range
H(SrcIP)Hash space
Ph = selection range / hash space
Pfif ( random no. ≤ Pf && no flow record) create a flow record
Pp
if ( random no. ≤ Pp && flow record) update the flow record
1
Tunableparameters
2
2
Flow Memory
Exploring parametric space Router sampling budget Pe = f(Ph, Pf, Pp) Trade-off between accuracy of flow statistics
and # flows/application session Parameters can be tuned depending on
Objective Network environment
Examples of tuning parameters by objective Application classification: low Ph, high Pf, low Pp
Application characterization: lower Ph, high Pf, high Pp
Flow statistics of all flows: Ph = Pf = Pp = Pe
Evaluation goals Application characterization
Question 1: Is RelSamp effective for sampling more # of flows in an application session?
Question 2: Can RelSamp estimate statistics of an application session?
Application classification Questions 3: Is sampling more # flows in an
application session beneficial for application classification?
Experimental setup Evaluation of effectiveness for capturing more flows
Trace 1: 1 hour packet trace collected at an edge RelSamp configuration (other settings in paper): Capture
more flows of app session from many hosts , , ()
Evaluation of application classification accuracy Trace 2: 13-hour full-payload trace captured at a dorm
network RelSamp setting: Similar setting, but varies from 0.1 to
1.0 Classifiers: BLINC [SIGCOMM ’05] , SVM, and C4.5 Ground truth is obtained using DPI-based classifier (tstat)
Flows per application session
#captured flows/#total flows in an app session
CD
F
More # of flowsper app session
Accuracy of BLINC classifier
Sampling rate
Acc
ura
cy (
%)
Note: classification results on flows using non-standard port
~ 50% increase
Related work Flow Sampling [ToN ’06]
Samples flows once flow record is created Flow Slices [IMC ’05]
Focuses on controlling router resources (CPU and memory)
cSamp [NSDI ’08] Supports sampling of all traffic by coordinating
various vantage points in a network FlexSample [IMC ’08]
Support monitoring of traffic subpopulations, but needs to maintain extra states for approximate checking of predicates
Summary Introduced the notion of related sampling
Samples more number of related flows in the same application session with higher probability
Proposed RelSamp architecture Preserve application structure in sampled flow records
Effective to preserving application session structure 5-10x more flows per application session compared
to Sampled NetFlow Up to 50% higher classification accuracy than
Sampled NetFlow
Thank you! Questions?
Evaluation method of classification techniques
DPI-basedClassifi
erRelSam
pSample
dNetFlo
wFlowSampli
ng
Ground
TruthFlow
Record1
FlowRecord
2Flow
Record3
Cla
ssifica
tion
Alg
orith
m(e
.g., B
LINC
, SV
M,
C4.5
)
Packet
TraceReport
Tstat
Comparison with other solutions using BLINC
Sampling rate# o
f acc
ura
tely
cla
ssifi
ed
flow
s
Note: classification results on flows using non-standard port