Upload
hugo-jackson
View
215
Download
0
Embed Size (px)
Citation preview
Trajectory Sampling for Direct Traffic Oberservation
N.G. Duffield and Matthias Grossglauser
IEEE/ACM Transactions on Networking, Vol. 9, No. 3 June 2001
Problem: Which (spatial) path does traffic take?
Circuit switched networks (e.g. telephone):
Per-call state is maintained
=>trivial
IP networks:Don’t maintain per-
flow information
?
Why is this interesting? Quality of Service depends on traffic
management Traffic control
Timescale: seconds no human intervention
Traffic engineering Timescale: minutes - months Resource allocation Pricing Failover strategies
Options
Indirect measurement
Uses information on Network model Network state
Direct measurement
Direct observation of traffic at multiple points in the network
Problems with indirect measurement Behavior of network elements
depends on vendor-specific design choices
Deliberate sources of randomness to avoid collision
Events outside domain (route advertising by neighboring domains)
Interactions may be too complex to predict
Direct measurement:Sampling of packets Sample packets that traverse each
link Subset of packets used as
representative
Problem: How do we get the actual
path?
Key idea of the paper Use a deterministic hash function
over the packet’s content to determine subset of packets
Use the same hash function throughout the domain
Use second hash function to label packets
Theory Measurement domain represented as a
directed graph Packets
enter at ingress node exit at egress node
Invariance function Packet content without changing fields, e.g.time-to-live field which is decremented each
hop
Sampling Hash Function Decides whether or not a given
packet should be sampled Deterministic function of the
invariant packet content Same function on each link Results in L-bit binary number
Identification Hash Function Entire packet content could be used Aim: limit traffic to measurement
collection system Results in m-bit binary number Additional information may be
included Length of packet Source, destination
Invariant content Header: three categories of fields
Variable fields (not included) E.g., TTL, header checksum, etc.
Low entropy fields (not included) Content changes little between packets E.g., version, header length, protocol
High entropy fields (included) Source and destination IP, etc.
Part of remainder of packet
Ambiguities (f-h)
Dealing with ambiguities Probability that trajectory can be
disambiguated depends on network topology and traffic => renormalization of results necessary
Safer to discard all duplicate labels (greater loss of samples)
Specification of Hash Functions Ordered bits of invariant part of
packet content x are considered as binary integers: (x)
Sampling hash functionh((x)) = (x) mod A
Identification hash functiong((x)) = (x) mod B
with A, B positive integers
Identical Packets Automatically ambiguous
=> lead to biased estimatorsQuestion: How much packet content is
needed to avoid collisions?Answer: 40 bytes lead to collision
probability smaller than 10-3
Implementation of hashing 40 byte “numbers” are represented by
vector of 16 bit wordsz = (zk ,zk-1,…,z0) = i=0
k zi 216i
Use 32 bit long division Iteratively compute
(zk ,zk-1,…,z0) mod A
= (zk-1+ 216(zk mod A),…,z0) mod A
Sampling independent of packet content? Note: IP address of source and
destination are included in the invariant content!
Chi-squared test 40 byte packet prefix
=> 95% confidence level 20 byte packet prefix results in strong
dependence
Optimal Sampling Tradeoff
More unambiguous samples => more accuracy
More samples => more measurement traffic
Optimize for given measurement traffic mn(m bits per sample, n samples) Small m increases collisions Large m means smaller n
(Question to the authors Doesn’t the measurement traffic
itself get sampled and thereby add another source of error?
… may be part of their future work statement)
ExampleService provider wants to determine what
fraction of packets on a certain backbone link belongs to a certain customer
Compare customer packets observed both on backbone
and on access link Total number of packets observed on
backbone Real and estimated fractions largely
within error bars
Implementation issues Can trajectory sampling be part of next
generation of high-speed interfaces? Authors claim “yes”:
Compute both hash functions in parallel Processor cost negligible compared with
cost of interface cards Processor speed doubles every 18 months,
maximum trunk speed every 21 months
Other Common Approaches Aggregation-
based approaches e.g., sum of
packets traversing a link
Sampling-based approaches sample subset of
observations
Aggregation-based Approaches Link measurements (direct)
Traffic statistics (# of bytes / # of packets transferred / dropped)
Measurements reported periodically Flow aggregation (indirect)
Flow: sequence of packets with common field in header
Relies on emulation of routing protocol
Sampling-based Approaches Active end-to-end probes (direct)
Hosts send probe packets to one or more other hosts
Packet loss rate Round-trip delay
End-to-end path characteristics Variation: collect and exchange
measurements of multicast session
Related Work Measure end-to-end performance
of individual flows ATM cells sampled at ingress and
egress points Determine QoS for a single
connection, e.g., delay and loss rate
Extensions and Other Applications Distributed denial of service attacks
Attackers use packet spoofing Filtering
A configurable packet filter may allow trajectory sampling for a subset of packets
Probe Packets Packet content may be constructed to
ensure sampling
Conclusions Simple processing No Router state required Packets directly observed