1 20 February 2006
Measuring the Internet:State of the Art and Challenges
February 2006
Matti Siekkinen [[email protected]]
Institut EurecomSophia Antipolis, France
2 20 February 2006
Background and motivationThe Internet
Part 1: Different measurement approaches and techniquesPassive vs. activeOn-line vs. off-lineAggregation levelHardware vs. softwareStorage methodsEnvironmentCase study 1: InTraBaseCase study 2: Gigascope
Part 2: The analysisTraffic characterization and modelingNetwork characterization and modelingAnomaly detectionCase study on TCP Root Cause Analysis
3 20 February 2006
What is the Internet?
4 20 February 2006
5 20 February 2006 6 20 February 2006
What is the Internet?
A collection of computers capable of communicating with each other using a standard set of protocols (TCP/IP)
An internetwork: made up of numerous networksEach network comprises numerous hosts and routersHosts are endpoints, routers are internal way-stationsConnections between hosts are links
Packet network with best effort delivery
7 20 February 2006
What is the Internet?
Internet Services Providers (ISP) classified into tiers based onsize and capacity
tier 1: global reach, 20 (British Telecom (BT), Cable & Wireless, Global Crossing, Level 3, Sprint, MCI (UUnet), Verio (NTT)..)
the backbone of the Internettier 2: regional, ~3000tier 3: local, ~17000
Each ISPs distinct network forms an AS (Autonomous System)
180,000 reachable networks
8 20 February 2006
What is the Internet?
9 20 February 2006
Why do we need to measure it?
Internet was not designed for the current usageOriginally designed as a private military research network that should be resilientNo built-in security, QoS, nothing…
Operators want to correctly provision their networksModeling trafficModeling user behavior
Operators want to manage their networksLoad balancingIdentify bottlenecks
In the case that the network is not correctly provisioned…Identify misconfigured devices (e.g. routers)
10 20 February 2006
Why do we need to measure it?
Guide Internet application developmentModel the trafficPerform empirical studies
Simulations are not always sufficient
Security related issuesUsing honeypots to understand attacking processesIntrusion Detection Systems (IDS)
11 20 February 2006
Why is all this very challenging?
The Internet has no built-in measurement mechanismsThe End-to-end arguments (J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in system design. ACM Transactions on Computer Systems. 1984)The network is stupid, intelligence is at the edges
The Internet is a constantly moving targetTraffic volumes are ever increasingDominating applications
before: HTTP (Web) and FTPnow: P2Ptomorrow: ?
Access link capacitiesa few years ago (in Europe): 512 Kbit/snow: > 8Mbit/s
More and more mobilityThere is no such thing as “typical”
12 20 February 2006
Why is all this very challenging?
Traffic volumes are very largeMethods need to be scalable
Traffic data is sensitiveLegal issues: privacyBusiness: ISPs are reluctant to disclose any informationSecurity: attackers get the same knowledge
(however: no security through obscurity)
13 20 February 2006
Part 1: Different measurement approaches and techniques
Passive vs. activeOn-line vs. off-line
DSMSsAggregation level
SamplingHardware vs. softwareStorage methods
flat files vs. DBMS vs. data warehouseEnvironment
wired vs. wirelessbackbone vs. WAN vs. LAN
Case study 1: InTraBase (Integrated Traffic Analysis using a DBMS)Database system for passive, off-line measurements and analysis
Case study 2: GigascopePassive, on-line packet monitoring platform (a kind of DSMS…)
14 20 February 2006
Measurements: Passive vs. active
PassiveObserve and record the traffic as it passes byUseful for characterizing the Internet traffic
☺ Measures real traffic☺ Does not perturb the network
No control over the measurement process
ActiveInject packets into the network, follow them and measure serviceobtainedUseful for inferring the network characteristics
☺ Full control over the measured traffic☺ Important for available bandwidth and link capacity estimation techniques
Need often access to two measurement points at strategical locationsCan perturb the network
15 20 February 2006
Measurements: On-line vs. off-line
On-linePerform (at least a part of) the analysis on the observed traffic in a real-time mannerOften necessary when handling very large amounts of traffic
E.g. monitoring of one Abilene Intenet2 backbone link (OC-192, 10 Gbit/s link) produced >8 MBytes/s of uncompressed packet headers
☺ Data reduction, don’t need to store everything☺ Results right away, can react immediately
Efficient solutions can be complex and/or proprietary (Gigascope)Do not necessarily have all the raw packet data for later analysisSee also the lecture “DSMS for Network Monitoring” by Prof. V. Goebel
Off-lineCapture traffic into trace files and analyze later
☺ Possible to run complex time-consuming analysis☺ Simple and cheap solutions exist (e.g. tcpdump)
Not applicable for time critical scenariosStorage can become an issue
16 20 February 2006
Measurements: Aggregation levelPackets
Capture whole or partial packets (e.g. only TCP/IP headers)☺ Have it all
☺ Can construct connection-level data and/or do detailed packet-level analysisStorage requirementsAnalysis is resource consuming and thus slow
FlowsUsually grouped by timeouts and/or maximum packet counts and the five tuple: (src IP,dst IP,src port,dst port,layer 3 protocol)Cisco’s Netflow (latest version 9)
Flows computed in Netflow enabled devices (e.g. routers)7 keys to define a flow: src & dst addresses, src & dst ports, layer 3 protocol, TOS (type of service) byte, input interfaceFlow ends (by default) after 15s inactive timeout, 30min active timeout, or when flow cache is full ⇒ not true connections
☺ Reliefs memory requirements for on-line measurementsConnection level analysis needs assemblyingLoose packet-level knowledge
17 20 February 2006
Measurements: Aggregation level
ConnectionsTCP connections
☺ True connection-level data enables full-scale end-to-end analysisTough memory requirements for on-line analysis
When is TCP connection finished?Loose packet-level knowledge
SamplingDo not record each packetUse statistical methods to estimate e.g. flow sizesFor the interested ones: see the work of N. Duffield (AT&T Labs-Research)
☺ Utilize less resourcesTrade off some accuracy
18 20 February 2006
Measurements: Hardware vs. software
HardwareEndace’s DAG cards
Only for packet capturingNetwork processors (Intel’s IXP cards)
Programmable devices for packet processing☺ Fast ⇒ can measure/process Gigabit and faster links☺ Accurate
☺ GPS synchronized accurate timestamping (DAG)☺ Less packet drops and corruption
Often very expensiveCan be non-trivial to use
Softwaretcpdump and friends
☺ Cheap (free)☺ Easy to use
SlowPrecision may be too low for certain use cases
19 20 February 2006
Measurements: Storage methods
Plain filesThe most common method
☺ SimpleManagement can become problematic in case of large amounts of measurements and analysis results
Non-reproducible results
DatabaseStore measurements and/or metadata and/or analysis results
☺ Solves many problems of management☺ Correct design can bring good performance (indexes etc.)
Some overhead in disk space usage and processingRequires initial investments to master
20 20 February 2006
Measurements: Storage methods
Data warehouseCollect, summarize and organize data within a database in order to facilitate and speed up further processing, commonly queryingDBS optimized for analysis and/or data mining Complex queries to extract information that reveals correlation between different sources of dataCreate collections, “views”, of data from “raw”data by aggregating the datasWarehouse can be periodically reconstructed, periodically updated from the sources, or updated upon changes in source datas
☺ Enables mining large amounts of dataThe quality of analysis depends on the quality of the defined viewsIt takes time to master DW techniques
21 20 February 2006
Measurement environment: Wired or not?
WiredEverything else in the Internet is wired except maybe the last hops(Typically) FiFo scheduling with drop tailRIP and OSPF for intra-domain routingBGP for inter-domain routing
WirelessGPRS & UMTS802.11 (Wi-Fi)Can be non-FiFo schedulingDifferent routing mechanisms (e.g. ad-hoc networks)
Need to focus on different issues
22 20 February 2006
Measurement environment: Global vs. LocalWide area Internet traffic
ISP or university edgesBackbone
LocalWithin an ISPWithin a universityWithin an enterprise
Traffic characteristics are differentApplication setsTraffic volumes
Network characteristics are differentLink capacitiesDelays
Need to focus on different issues
23 20 February 2006
Part 1: Different measurement approaches and techniques
Passive vs. activeOn-line vs. off-line
DSMSsAggregation level
SamplingHardware vs. softwareEnvironment
wired vs. wirelessbackbone vs. WAN vs. LAN
Storage methodsflat files vs. DBMS vs. data warehouse
Case study 1: InTraBase (Integrated Traffic Analysis using a DBMS)Database system for passive, off-line measurements and analysis
Case study 2: GigascopePassive, on-line packet monitoring platform (a kind of DSMS…)
24 20 February 2006
M. Siekkinen, E.W. Biersack, V. Goebel, T. Plagemann, and G. Urvoy-KellerInTraBase: Integrated Traffic Analysis Based on a Database
Management SystemE2EMON 2005
M. Siekkinen, V. Goebel, and E.W. BiersackObject-Relational DBMS for Packet-Level Traffic Analysis: Case
Study on Performance OptimizationE2EMON 2006
Database system for passive, off-line measurements and analysis
Case study 1: InTraBase
25 20 February 2006
Case study 1: InTraBaseOutline
Motivation
Our InTraBase approach
First Prototype of InTraBasePerformance evaluation
Conclusions
26 20 February 2006
Case study 1: InTraBaseMotivation
Current situation in off-line traffic analysis
State of the art is handcrafted scripts and numerous specialized software tools
Traffic analysis is an iterative process
Large amount of data for analysis
27 20 February 2006
Case study 1: InTraBaseMotivation
Resulting problems:
1. Management• Data, metadata, and tools• Getting lost with files containing data and
ad-hoc scripts
2. Analysis cycle• Data loses semantics and structure
3. Scalability• Cannot even analyze 10GB data sets
Filter
Process
Combine
Store
Interpret
Definenew task
28 20 February 2006
Case study 1: InTraBaseThe approach
Perform traffic measurements and store results in files
Upload base data into the db and process it within the db
Issue SQL queriesUse extensibility of the DBMS to create functions for advanced processing
Base data:tcpdump packet tracesApplication logs...
29 20 February 2006
Application logs
Case study 1: InTraBaseThe approach
Web100 Raw base datafiles
Network link
Data Warehouse
SubSubSubSubSubSub
SubSubSubSub
SubSub
Off-line analysis
Base data
Results
Queries
Descriptions
DBMSApplication
TCP
IP
Preprocess
tcpdump
Functions
30 20 February 2006
Case study 1: InTraBaseBenefits from a DBMS-based Approach
Support provided by DBMS to organize and manage data, related metadata, analysis results and tools
The database consists of reusable componentsPerforming newanalysis is lesslaborious and errorprone
31 20 February 2006
Case study 1: InTraBaseBenefits from a DBMS-based Approach
Data becomes structured and conserves semantics
Processing and updating data is easier
Searching is more efficient (indexes)
Store reusable intermediate results
It is easier to combine different data sourcesE.g. application level events explain some of the phenomena in the traffic at TCP layer
32 20 February 2006
Case study 1: InTraBaseDrawbacks
However,
Initial investment to master a DBMS
Elevated processing time and disk space consumptionFor simple tasks with small datasets simple tools s.a. tcptrace are sufficient
33 20 February 2006
Case study 1: InTraBaseA Prototype of InTraBase
Analyze TCP traffic from tcpdump packet traces with PostgreSQL
PostgreSQLObject relational DBMS
Allows to extend the functionality with new functionsLarge user community => support
34 20 February 2006
1. Copy packets from a file into the packets
Case study 1: InTraBaseProcessing a tcpdump file
modified tcpdump or dagdump- enforce structure- add connection ids
DBSpcap trace file psql copy
2. Build an index for the packets based on cnxid
3. Create connection level statistics into connections
4. Insert unique 4-tuple to cnxid mapping data into cid2tuple
dag trace file
35 20 February 2006
Case study 1: InTraBasePrototype base table layout
36 20 February 2006
Case study 1: InTraBasePrototype functions
Contains a set of functions
pl/pgSQLProduce timeseries (packet inter-arrival times, throughput…)Plot time-sequence diagram in xplot format…
pl/RProduce graphsStatistical calculations
37 20 February 2006
Case study 1: InTraBaseHistogram of the packet inter-arrival times
of the fastest connection
SELECT plot_ts_hist(‘SELECT * FROM iat(t2.cnxid,t2.reverse,”packets”)','histogram.pdf') FROM(SELECT cnxid,reverse FROM cnxs,(SELECT max(throughput) FROMcnxs) AS t1 WHERE cnxs.throughput=t1.max) AS t2;
Find the connection with the highest throughput from Connections table
Fetch the packets of this connection from Packets table
Compute the inter-arrival times from timestamps
Plot the histogram
38 20 February 2006
Case study 1: InTraBaseHistogram of the packet inter-arrival times
of the fastest connection
SELECT plot_ts_hist(‘SELECT * FROM iat(t2.cnxid,t2.reverse,”packets”)','histogram.pdf') FROM(SELECT cnxid,reverse FROM cnxs,(SELECT max(throughput) FROMcnxs) AS t1 WHERE cnxs.throughput=t1.max) AS t2;
0
50
100
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
connectionsbytes
packetstput…
connection id
packets
timestampstart #seqend #seq
flags…
connection id
iat(…) plot_ts_hist()
histogram.pdf
39 20 February 2006
Case study 1: InTraBaseAnalysis of the feasibility
Analysis of the feasibilityProcessing timeDisk space consumption
Test data:BitTorrent traffic files
Few large connectionsMixed Internet traffic files
Lots of small connections
Tests run on Linux 2.6.3. , 2x Intel Xeon 2.2GHz, SCSI RAID
40 20 February 2006
Case study 1: InTraBaseAnalysis of the feasibility
total processing time vs. file size
41 20 February 2006
Case study 1: InTraBaseAnalysis of the feasibility
disk space consumption
42 20 February 2006
Case study 1: InTraBaseAnalysis of the feasibility
Processing time is good enoughRarely process larger files than 10 GB (overnight)
Overhead of 50% in disk space is acceptableUsually not an issue nowadaysPrice to pay for having structured data
43 20 February 2006
Case study 1: InTraBaseConclusions
Experiences with a flat file -based approachManagement becomes an issue
measurements, derived data, results, and toolsAnalysis tasks become cumbersome
⇒ Need a new approachA DBMS-based approach
Solves many of the management problemsPrototype scales reasonably well
Overheads are acceptableUsages
Helps a lot everyday analysis workOngoing work with France Telecom on performance monitoring/troubleshooting of their ADSL platform
Process periodically a trace captured at the edge of ADSL platformGUI in production…
44 20 February 2006
Part 1: Different measurement approaches and techniques
Passive vs. activeOn-line vs. off-line
DSMSsAggregation level
SamplingHardware vs. softwareEnvironment
wired vs. wirelessbackbone vs. WAN vs. LAN
Storage methodsflat files vs. DBMS vs. data warehouse
Case study 1: InTraBase (Integrated Traffic Analysis using a DBMS)Database system for passive, off-line measurements and analysis
Case study 2: GigascopePassive, on-line packet monitoring platform (a kind of DSMS…)
45 20 February 2006
Case study 2: GigascopeHow to monitor network traffic 5Gbit/sec
Chuck Cranor, Theodore Johnson and Oliver Spatscheck.Gigascope: a stream database for network applications.
SIGMOD 2003.
C. Cranor, T. Johnson, O. Spatscheck, and V. Shkapenyuk.The Gigascope Stream Database.
IEEE Data Engineering Bulletin 26(1):27-32. 2003.
46 20 February 2006
Case study 2: GigascopeOutline
Motivation
Features
GSQL
Performance
47 20 February 2006
Case study 2: GigascopeMotivation
Objectives: To manage the network’sSecurityReliabilityPerformance
Requirements for the network monitoring toolFlexibility and performance in order to achieve reactiveness
Achieve high speeds• GETH• OC48 (2.48Gbit/sec full duplex)• OC192 (9.92Gbit/sec full duplex)
Be inexpensiveAllow retrofittingBe reliableBe remotely manageable
All 7 layers need to be monitored
48 20 February 2006
Case study 2: GigascopeMotivation
Tcpdump and off-line data analysis (=InTraBase) is out of question
Does not scale to even moderate line rates for on-line monitoringExpensive due to disk cost and space
Standard tools such as: SNMP, RMON, NetflowDo not cover layer 7 in a scalable fashionDo not allow changes flexibly to the aggregation type
Proprietary vendor products (sniffer.com, NARUS, handcrafted libpcap tool, …)
Do not allow changes flexibly to the aggregation type
49 20 February 2006
Case study 2: GigascopeFeatures
Fast, flexible packet monitoring platformOn-linePassiveA custom DSMSFlexible SQL-based interface
Claimed to be able to monitor at speeds up to OC48 (2 * 2.48 Gbit/sec) with a single probe
Supports (in 2003):Netflow compatible recordsDetection of hidden P2P trafficEnd-to-end TCP performance monitoringDetailed custom performance statistics
50 20 February 2006
Case study 2: GigascopeFeatures
Data gathering steps:1. Raw data feed from:
Optical splitterElectrical splitterMonitoring/SPAN port
2. Extract aggregated records from the data feed in real time3. Store data on local RAID4. Data is copied in real time or during off peak hours to data warehouse
for further analysisSSH secured back channelTools allow rate limiting to prevent the Gigascope from flooding the network
5. Data is analyzed and/or joined with other data feeds using Daytona or other tools in the data warehouse
6. Result is displayed, or used for alarming, or to generate customized reports
51 20 February 2006
Case study 2: GigascopeGSQL
GSQL is the query language for GigascopeSimilar to SQLSupport for stream database queries
Stream fields can have ordering propertiesDeduce when aggregates are closed and thus can be flushed to theoutput stream
Each query:Receives one or more tuple streams as inputGenerates one tuple stream as output
Currently (2003) limited to selection, aggregation, views, joinsand mergesQuery compiler maps logical query topology to an optimized FTA processing topology
52 20 February 2006
Case study 2: GigascopeConclusions
Gigascope is a packet monitoring platform
Main advantagesFast (at least claimed to be)Flexible through SQL-like query languageData reduction
DisadvantagesProprietaryUsually needs another solution for data storage (Daytona)
53 20 February 2006
Part 2: The analysisTraffic characterization and modeling
TCPP2P
Network characterization and modelingTopology discoveryNetwork coordinatesMeasuring link capacities and available bandwidthTraffic matrices
Anomaly detectionNetwork troubleshootingIntrusion detection
Case study on TCP Root Cause Analysis
54 20 February 2006
Traffic characterization and modelingCharacterize and model traffic on various layers
Application layer: P2P, On-line games, Skype, WWW…Transport layer: TCP, (UDP)IP layerMAC layer: Wireless environmentsPhysical layer: Especially wireless links
ObjectivesFirst understand, then compute and predict network and application performance, and user behavior
Provision and troubleshoot networksGuide application development (e.g. caching)Provide accurate workload models for simulations
Identify different applicationsNeeded for the first objectives too…Enforce rules and regulations
55 20 February 2006
Traffic characterization and modeling:TCP
TCP carries over 90% of the bytes in the Internet
Modeling TCPExpress the performance of a TCP transfer as a function of some parameters that have physical meaning
Parameters: packet loss rate (p), round-trip time (RTT) of the TCP connection, the receiver advertised window, the slow start threshold, initial window size, window increase rate etc.Performance metrics: Throughput, latency, fairness index etc.
E.g. the Square Root Formula: (Mathis et al. 1997)
More advanced modelingAdvanced models for loss processesQueuing theory
pRTTMSSTput
23
=
56 20 February 2006
Traffic characterization and modeling
57 20 February 2006
Traffic characterization and modeling:TCP
Empirical approachInfer techniques from observations on real Internet trafficLess maths, more intuitive and simple modelsApply a tool or an algorithm on real packet traces and analyze resultsExamples
Studying the burstiness of TCP traffic• Why is the Internet traffic bursty in short (sub-RTT) time scales? H.
Jiang, C. Dovrolis. SIGMETRICS 2005.TCP Root Cause Analysis (case study in the end)
• How to identify the cause that prevents a TCP connection from achieving a higher throughput?
More also on the lecture “Operational analysis of TCP in the wild”by Dr. Guillaume Urvoy-Keller
58 20 February 2006
Traffic characterization and modeling:P2P
Identifying P2P trafficNeed to identify it before it can be characterized…Regulations and rules (RIAA)Not trivial since it can hide behind other TCP ports (80)
Circumvent filtering firewalls and legal issuesIdentify by well-know TCP ports☺ Fast and simple
May capture only a fraction of the total P2P trafficSearch application specific keywords from packet payloads☺ Generally very accurate
A set of legal, privacy, technical, logistic, and financial obstaclesNeed to reverse engineer poorly documented P2P protocolsPayload encryption increasingly supported in P2P protocols
Transport layer identificationTransport layer identification of P2P traffic. T. Karagiannis, A. Broido, M. Faloutsos, Kc claffy. IMC 2004.Observe connection patterns of source and destination IPs
☺ Identify > 95% of P2P flows and bytes, 8-12% false positivesLimited by knowledge of the existing connection patterns
59 20 February 2006
Traffic characterization and modeling:P2P
Characterizing and modeling P2P application trafficImprove the performance and scalability of P2P applicationsEvaluate their impact on the networkBuild mathematical models for the behavior and verify by applying to real traffic
Mathematical model enables accurate analysisE.g. Modeling and performance analysis of BitTorrent-like peer-to-peer networks. D. Qiu, R. Srikant. SIGCOMM 2004
Empirical analysis of P2P traffic“Tune the knobs” and draw conclusions from what you observeE.g. Dissecting BitTorrent: Five Months In Torrent's Lifetime. M. Izal, G. Urvoy-Keller, E.W. Biersack, P.A. Felber, A. Al Hamra, L. Garces-Erice. PAM 2004.
See also the lecture “P2P systems for file replication” by Prof. Ernst Biersack
60 20 February 2006
Part 2: The analysisTraffic characterization and modeling
TCPP2P
Network characterization and modelingTopology discoveryNetwork coordinatesMeasuring link capacities and available bandwidthTraffic matrices
Anomaly detectionNetwork troubleshootingIntrusion detection
Case study on TCP Root Cause Analysis
61 20 February 2006
Network characterization and modeling:Topology discovery
The art of finding out how the network is layed outNot trivial knowledge in large scale networks
Why do this?It is fun ☺Realistic simulation and modeling of the Internet
Some examples of methodsSNMP
Works only locallyTraceroute@home
Do traceroute in large scaleCoordinate the efforts in order to be highly effective while avoiding unnecessary load on networkSee http://tracerouteathome.net/
SkitterICMP ECHO-REQUEST probes (= traceroute) from 30-40 monitors to measure delay and IP pathGather actively used IP addresses from a number of sources
• Bbone packet traces, NeTraMet traces, NetGeo, CAIDA website hits…
62 20 February 2006
Network characterization and modeling:Network coordinates
Express the communication latency, the “distance”, in virtual coordinatesEnables predicting round-trip times to other hosts without having to contact them first
Useful in selecting a mirror server or peers in P2P systemsGeneral approach:1. Select a subset of hosts for reference points (RP)
Create the origin of the coordinate system2. Measure round-trip-time (distance) between RPs3. Calculate coordinates for each RP4. Measure RTT between host and RPs5. Calculate coordinates for the host
Different proposed techniques for steps 1,3 and 5Reference points = landmarks, lighthouses, beacons
63 20 February 2006
Network characterization and modeling:Link capacities and available bandwidth
What?Infer the maximal throughput that a TCP transfer can achieve…
at a given time instant (available bandwidth)when there is no other traffic (capacity)
…on a specific link or on entire pathWhy?
Network aware applicationsRoute selection in overlay networksQoS verificationTraffic engineering
How?Generally use active probing: introduce packets with a specific traffic pattern and observe the pattern at the other endSee the lecture “Measuring link capacities in the Internet” by Dr. Guillaume Urvoy-Keller
64 20 February 2006
Network characterization and modeling:Traffic matrices
Traffic matrix is a network wide view of the trafficRepresents for every ingress point i into the network and every egress point j out of the network, the volume of traffic Ti,j from i to j over a given time intervalInput for any capacity offer engineering plan
Routing matrix, link capacity, traffic engineering, etc…
Problem: Can not measure directlyFlow-level measurements at ingress points can generate terabytes of data per day
Solution: Estimate
65 20 February 2006
Network characterization and modeling:Traffic matrices
A B
E
C D
5 4
3 6
33D12CBAsrc
dst
AE, BE, EC, ED obtained using SNMPLink ED = AD + BD, Link AE = AD + AC…
We have a linear system Y = AXX are the Ti,j values to be estimatedA are IGP link weightsY can be obtained using SNMP
Fundamental problem: # links < < # OD pairsunder-constrained systeminfinitely many solutions
A variety of different proposed solutions
24D21CBAsrc
dst
66 20 February 2006
Part 2: The analysisTraffic characterization and modeling
TCPP2P
Network characterization and modelingTopology discoveryNetwork coordinatesMeasuring link capacities and available bandwidthTraffic matrices
Anomaly detectionNetwork troubleshootingIntrusion detection
Case study on TCP Root Cause Analysis
67 20 February 2006
Anomaly detection
Study abnormal trafficNon-productive traffic, a.k.a. Internet “background radiation”Traffic that is malicious (scans for vulnerabilities, worms) or mostly harmless (misconfigured devices)
Network troubleshootingIdentify and locate misconfigured or compromised devices
Intrusion detectionIdentify malicious activity before it hits youHoneypot project at Eurecom (http://www.leurrecom.org)
Characterize attack processes
68 20 February 2006
Part 2: The analysisTraffic characterization and modeling
TCPP2P
Network characterization and modelingTopology discoveryNetwork coordinatesMeasuring link capacities and available bandwidthTraffic matrices
Anomaly detectionNetwork troubleshootingIntrusion detection
Case study on TCP Root Cause Analysis
69 20 February 2006
Case study on TCP Root Cause Analysis
Yin Zhang, Lee Breslau, Vern Paxson and Scott ShenkerOn the Characteristics and Origins of Internet Flow Rates
SIGCOMM 2002
M. Siekkinen, G. Urvoy-Keller, E. Biersack, and T. En-NajjaryRoot Cause Analysis for Long-Lived TCP Connections
CoNEXT 2005
70 20 February 2006
Case study on TCP Root Cause AnalysisOutline
Motivation and Objectives
Flow rate characteristics
Taxonomy of TCP rate limitation causes
One approach to infer limitation causes
Experimental results
Conclusions
71 20 February 2006
Motivation
Facts about The Internet:over the last 5 years…
Traffic volumes and number of users have skyrocketed
Access link capacities have multiplied
Dominance shifted from Web+FTP into Peer-to-peer applications
TCP has kept its position as the dominating transport protocol
72 20 February 2006
Motivation
Questions are raised:
ISPs want to know what is going on in their network
What are the limitations that Internet applications are facing?
Why does a client with 4Mbit/s ADSL access obtain only total throughput of 2Mbit/s when downloading movies with eDonkey?
Need techniques for traffic measurement and analysis
73 20 February 2006
Objectives of TCP RCA
Learn more about rates and rate limitations of data transfers in the Internet
TCP typically over 90% of all traffic
Study long lived connectionsReveal the root cause of limitation
Do quantitative analysisPassive traffic analysis techniques
Observe traffic at a single measurement point (e.g. at the edge of an ISP’s network)Capture and store TCP/IP headers, analyze later off-line
☺ See earlier slides about passive measurementsNeed to estimate many parameters
74 20 February 2006
Flow rate characteristicsDatasets and Methodology
DatasetsPacket traces at ISP backbones and campus access links
8 datasets; each lasts 0.5 – 24 hours; over 110 million packetsSummary flow statistics collected at 19 backbone routers
76 datasets; each lasts 24 hours; over 20 billion packets
Flow definitionFlow ID: <SrcIP, DstIP, SrcPort, DstPort, Protocol> Timeout: 60 seconds
Rate = Size / DurationExclude flows with duration < 100 msec
Look at:Rate distributionCorrelations among rate, size, and duration
75 20 February 2006
DataSets
Observe:The diversity of the dataset (in terms of aggregation level)The fact that sampling is required as speed is increasingNot all traces are bidirectional
76 20 February 2006
Flow Rate CharacteristicsRate distribution
Most flows are slow, but most bytes are in fast flowsDistribution is skewed
Not as skewed as size distribution
10% of flows have rate > 100 kbtis/s
10% of flows send > 10 kbytes
77 20 February 2006
Flow Rate CharacteristicsCorrelations
Rate and size are strongly correlated Not due to TCP slow-start
Removed initial 1 second of each connection; correlations increase
What users download is a function of their bandwidth
R: RateD: DurationS: Size
78 20 February 2006
Limitation Causes for TCP Throughput
Application
TCP End PointReceiver window limitation
NetworkBottleneck link
TCP protocolSS and CA
Application Application
TCP TCPNetwork
Sender Receiver
buffers
79 20 February 2006
An example of xplot time-sequence diagram
receiver advertizedwindow limit
received acknowledgments
retransmitted datasent data packets
pushed data pktis marked with a diamond
outstanding bytes
size of receiver advertized window
80 20 February 2006
Application that sends small amounts of data at constant rateStreaming applications
Skype: Internet telephony applicationWeb radios
Throttling applicationsP2P: eDonkey (rate control by user)
81 20 February 2006
Application that sends larger bursts separated by idle periodsBitTorrent, HTTP/1.1 (persistent)
only keep-alive messages
transfer periods
82 20 February 2006
TCP End Points: Receiver window limitationmaximum amount of outstanding bytes = min(cwnd,rwnd)Intentionally
the sender’s upload capacity is too high compared to the receiving TCP’s download capacity
Unintentionallydefault maximum receiver advertized window is set too low by the operating systemwindow scaling is not enabled
83 20 February 2006
Limitation Causes: Network
Limitation is due to a bottleneck link
packets get dropped due to filled buffers
shared bottleneck: obtain only a fraction of its capacity
non-shared bottleneck: obtain all of its capacity
84 20 February 2006
Limitation Causes: TCP protocol
Limiting factor is TCP’s congestion avoidance or slow start algorithm
Transfer ends before the rate grows enough to hit limits set by network or receiving TCP
85 20 February 2006
One Approach to Root Cause Analysis
Inputa bidirectional packet trace
Aggregation levelConnection level analysis
Divide & Conquer1. Identify bulk transfer periods
The other traffic is limited by the application2. Analyze the bulk transfer periods for
Receiver window limitationNetwork limitation
Methods are based on generated time series
86 20 February 2006
Identifying Bulk Transfer Periods (BTP)Use a time series of fraction of pushed data packets
fraction of pushed data packets (P) of all data packets seen for each non-overlapping time window
TimeTime window
P = 0.25 P = 1 P = 0.75
pkt with push flag pkt with no push flag
87 20 February 2006
Identifying BTPs
n1 consecutive P < pth starts a bulk transfer period
n2 consecutive P > pth starts an application limited period
Drawback: Need to select pth, n1, n2, and time window for each application
We chose empirically pth=0.7 , n1=5 , n2=10, and a time window of 1s for BitTorrentWe have already a more generic solution
88 20 February 2006
BTP Analysis
Apply the time-series based limitation test algorithms
Tests give limitation scoresDegree of limitationTCP end points: receiver window limitation scoreNetwork: retransmission score and dispersion score
89 20 February 2006
Receiver window limitation test
Uses two time series:outstanding bytes (O)receiver advertised window (R)
Compute R-O for each pair of valuesindicates how close the sender is to the limit set by the receiver advertised windowoutput 1 if R ~ O, and 0 otherwise
Limitation score is the average value from the R-O comparison
indicates the fraction of time being limited by the receiver advertised window
90 20 February 2006
Retransmission scorefraction of bytes retransmitted
Dispersion scoreassess the impact of the bottleneck on the throughput
if DS is close to zero tput~r non-shared bottleneck linkelse shared bottleneck link
Network limitation test
path theofcapacity is where,1DS rr
tput−=
91 20 February 2006
Experimental Results
Analysed a large BitTorrent packet trace
Seeding a single very big torrent
102 million packets
60.000 connections
92 20 February 2006
Bulk Transfer Periods3295 BTPs from 696 connections
BTPs carry most of bytes but ALPs are longer
93 20 February 2006
Limitation Scores
Receiver Window Limitation65% of BTPs are not limited by rwnd at allBTPs with small avg receiver windows have higher scores
Network LimitationHigh retransmission scores
20% of BTPs retransmitted over 10% of bytesHigh dispersion scores
95% of BTPs achieve less than half of the capacity of the path
94 20 February 2006
Network LimitationReceiver window limitation score < 0.5
High retransmission score induces high dispersion score
95 20 February 2006
complete transfer periodno lossRTT is 20 times the initial RTT
shared bottlenecktransfer ends before CWND reaches RWND value
Q: What is the limiting factor?A: TCP protocol through Congestion
Avoidance algorithm
96 20 February 2006
Conclusions
Characteristics of Internet flow ratesFast flows carry most of the bytes
It is important to understand their behavior.Strong correlation between flow rate and size
What users download is a function of their bandwidth.
TCP transmission rate limitation analysisCauses can be on different layers and in different locations
Application, TCP, or IPEnd hosts, network
One approach is to use time series-based techniquesAllows for quantitative analysis