36
NET0183 Networks and Communications Lectures 17 and 18 Measurements of internet traffic (IP) 8/25/2009 1 NET0183 Networks and Communications by Dr Andy Brooks

NET0183 Networks and Communications - Háskólinn á …staff.unak.is/andy/Networks0910/Lectures/NET0183Lec1… ·  · 2010-02-17NET0183 Networks and Communications by Dr Andy Brooks

Embed Size (px)

Citation preview

NET0183 Networks and Communications

Lectures 17 and 18Measurements of internet traffic (IP)

8/25/2009 1NET0183 Networks and Communications

by Dr Andy Brooks

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks2

http://portal.acm.org/citation.cfm?id=1298321

2.1 Collection of Traces

• Traces were collected on an SDH ring running Packet over SONET (PoS).– “Packet over SONET/SDH, abbreviated POS, is a

communications protocol for transmitting packetsin the form of the Point to Point Protocol (PPP) over SDH or SONET, which are both standard protocols for communicating digital information using lasers or light emitting diodes (LEDs) over optical fibre at high line rates.” • Wikipedia 16-Feb-10

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks3

Synchronous Optical Networking (SONET)Synchronous Digital Hierarchy (SDH)

• “On the two OC-192 links (two directions) we use optical splitters attached to two Endace DAG6.2SE cards. The DAG cards captured the first 120 bytes of each frame to ensure that the entire network and transport header information is preserved.”– Endace is a supplier of high-speed network traffic

capture technology.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks4

2.1 Collection of Traces

Network measurements: in software or in hardware?• Software tools interact with the operating system

and device drivers to obtain copies of network packets.

• Software tools are however limited: they can´t capture everything on a high-speed link.

• Special hardware is designed for high-speed links such as an internet backbone.

• Special hardware captures traffic directly by, for example, using optical splitters.

• Software tools are typically inexpensive, while special hardware is typically expensive.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks5

Network measurements: online or offline processing?

• Full packet tracing creates large amounts of data.

• Offline processing is not time critical and data can be re-analyzed in various ways.

• Online processing can involve filtering according to hosts or port numbers or other properties.

• Online processing can involve sampling where every nth packet is recorded.

• Online processing can involve packet truncation where only a fixed number of bytes are stored.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks6

• Very roughly speaking, the measurements were taken on links between the region of Göteborg and the rest of the Internet.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks7

2.1 Collection of Traces

E-mail traffic/E-commerce traffic/Online gaming traffic/...Search engine traffic/Social-networking traffic/...

• “Optical Carrier is a standardized set of specifications of transmission speeds that describe a range of digital signals that can be carried on Synchronous Optical Networking (SONET) fiber optic networks. The number attached to the Optical Carrier abbreviation, e.g., OC-48, is directly proportional to the data rate of the bitstream of the digital signal.

The rule for calculating the speed of optical-carrier-classified lines is that a specification given as OC-ndesignates a speed of n × 51.84 Mbit/s.”– Wikipedia 16-feb.-10

• OC-192 = 9,953.38 Mbit/s.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks8

2.1 Collection of Traces

2.1 Collection of Traces

• “The data collection was performed between the 7th of April 2006, 2AM and the 26th of April 2006, 10AM. During this period, we simultaneously for both directions collected four traces of 20 minutes each day at identical times. The times (2AM, 10AM, 2PM, 8PM were chosen to cover business, non-business and nighttime hours.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks9

4 * 2 * 20 = 160 traces20 consecutive days

2.2 Processing and Analysis

• The DAG cards discarded 20 frames within 12 traces due to receiver errors or HDLC CRC errors.– High-Level Data Link Control (HDLC) is a protocol.– Point to Point Protocol (PPP) is based on HDLC.

• After storing the data on disk, the payload beyond the transport layer was removed.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks10

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks11

Respect the user´s privacy.Delete any user data.

The first 120 bytes might include some user data.

http://windowsbootdisks.com/ip_images/ipmodelen.gif

2.2 Processing and Analysis

• A total of 71 frames within 30 traces had to be discarded due to IP checksum errors.– Single checksum errors are minor errors.

• “Trace sanitization refers to the process of checking and ensuring that the collected traces are free from logical inconsistencies and are suitable for further analysis.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks12

Sanity checks can include:

• Do timestamps on packets increase monotonically?– A packet should not have a timestamp earlier than a

packet before it in the trace.

• Are the interarrival times between packets in keeping with packet sizes and the line-speed of the carrier?– First, calculate how many packets you should be detecting

on average each second.

• Are there any identical IP headers within consecutive frames?

• Do counts before and after de-sensitization agree?– Anonymizing the data should not change the results.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks13

2.2 Processing and Analysis

• Trace desensitization refers to the process of ensuring privacy and confidentiality.– Packet payloads beyond the transport layer were

removed very early in the processing.

• IP addresses were anonymized using the prefix preserving CryptoPAn tool.– “In Cyrpto-PAn, the IP address anonymization is

prefix-preserving. That is, if two original IP addresses share a k-bit prefix, their anonymized mappings will also share a k-bit prefix.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks14

3. RESULTS

• “The 148 traces analyzed sum up to 10.77 billion PoS frames, containing a total of 7.6 TB of data. 99.97% of the frames contain IPv4 packets, summing up to 99.99% of the carried data. The remaining traffic consists of different routing protocols (BGP, CLNP, CDP).” – 12 discarded traces: 12/160 = 7.5%

– 91 discarded frames: 91/10.77 billion ≈ 8.5x10-9

• “The results in the remainder of this paper are based on IPv4 traffic only.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks15

BGPhttp://www.webopedia.com/TERM/B/BGP.html

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks16

“Short for Border Gateway Protocol, an exterior gateway routing protocol that enables groups of routers (called autonomous systems) to share routing information so that efficient, loop-free routes can be established. BGP is commonly used within and between Internet Service Providers (ISPs). The protocol is defined in RFC 1771.

CDPhttp://www.webopedia.com/TERM/B/BGP.html

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks17

“The Cisco Discovery Protocol (CDP) was developed by Cisco Systems. It's primarily used to obtain the protocol addresses of neighboring devices and also to discover the platform of those devices. It can also be used to show information about the interfaces your router uses. CDP is media and protocol-independent, and runs on all Cisco-manufactured equipment including routers, bridges, access servers, and switches. CDP runs only over the data link layer enabling two systems that support different network-layer protocols to learn about each other. CDP Version-2 (CDPv2) is the most recent release of the protocol and provides more intelligent device tracking features.”

3.1.1 IP packet size distribution

• Earlier measurements showed the distribution of IPv4 packet lengths to be trimodal:– ≈ 40 bytes (TCP acknowledgements)

– 576 bytes, default IP datagram size (RFC 879)

– ≈ 1500 bytes, Ethernet Maximum Transmission Unit

• Between 1997 and 2002, studies reported the fraction of packets with a default IP datagram size ranged between 10%-40%.

• In 2004, a study by Pentikousis et. al. found the distribution to be bimodal and that the default IP datagram size accounted for only 3.8% of all packets.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks18

Figure 1. The Cumulative IPv4 Packet Size Distribution©ACM

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks19

3.1.1 IP packet size distribution

Figure 1. The Cumulative IPv4 Packet Size Distribution

• The distribution is bimodal.

• 44% of packets between 40 and 100 bytes.

• 37% of packets between 1400 and 1500 bytes.

• The default IP datagram size of 576 bytes represents now only 0.95% of the traffic and is no longer in the top three modes.

– “This is caused by the predominance of Path MTU Discovery in today´s TCP implementations...”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks20

Path MTU Discovery@ Wikipedia 17-feb.-10

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks21

“Path MTU discovery works by setting the DF (Don't Fragment) option bit in the IP headers of outgoing packets. Then, any device along the path whose MTU is smaller than the packet will drop it, and send back an ICMP "Fragmentation Needed" (Type 3, Code 4) message containing its MTU, allowing the source host to reduce its path MTU appropriately. The process repeats until the MTU is small enough to traverse the entire path without fragmentation.”

Many networks block ICMP for security reasons…“A robust method for PMTUD that relies on TCP or some other packetization layer to probe the path with progressively larger packets has been standardized in RFC 4821 (Packetization Layer Path MTU Discovery).”

Figure 1. The Cumulative IPv4 Packet Size Distribution

• Two modes appear at 628 bytes and 1300 bytes representing 1.76% and 1.1% of the traffic.

• Analysing the TCP flows it was found that packets with 628 bytes usually came after full-sized packets and had the PUSH flag set.

• “We suspect that they are sent by applications doing ’TCP layer fragmentation’ on 2KB blocks of data, indicating the end of a data block by PUSH.”

• “A look at the TCP destination ports revealed that large fractions of this traffic are indeed sent to ports known to be used for popular file-sharing protocols like Bittorrent and DirectConnect.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks22

Figure 1. The Cumulative IPv4 Packet Size Distribution

• “The notable step at 1300 bytes on the other hand could be explained by the recommended IP MTU for IPsec VPN tunnels.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks23

IPsecShort for IP Security, a set of protocols developed by the IETF to support secure exchange of packets at the IP layer. IPsec has been deployed widely to implement Virtual Private Networks (VPNs). http://www.webopedia.com/TERM/I/IPsec.html

Virtual private networkA virtual private network (VPN) is a computer network that is layered on top of an underlying computer network.The term VPN can be used to describe many different network configurations and protocols. http://en.wikipedia.org/wiki/Virtual_private_network

Large packets

• 0.15% of traffic was larger than 1500 bytes.– The standard Ethernet MTU is 1500 bytes.

• Of the 0.15%, 99.7% were 4470 bytes.• Packet sizes up to 8192 bytes were observed.• “A minor part of the >1500 byte sized packets

represents BGP updates between backbone or access routers.” – The majority of the large packet traffic was identified

as customized data-transfer from a space observatory to a data center using jumbo frames over Ethernet.• non-standard Ethernet

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks24

Table 1(a) IPv4 Protocol Breakdown (%)©ACM

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks25

Earlier measurements reported TCP accounting for around 90 –95% of the data volume and for around 85-90% of IP packets.Both fractions are slightly larger in Table 1(a).

Table 1(b) UDP burst (%)©ACM

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks26

The UDP packets causing the burst were only 29 bytes, leaving only a single byte for UDP payload data.On investigation, it was found the burst stemmed from an undetected UDP DoS (Denial of Service) script on a webserver with a known vulnerability.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks27

http://www.fatpipe.org/~mjb/Drawings/

Andy comments: To check the checksum calculation, we need to know what the pseudo header is.

28

http://www.fatpipe.org/~mjb/Drawings/

3.2.1. IP type of service

• The TOS field can be used for Explicit Congestion Notification (ECN) and Differentiated Services.

• 83.1% of the packets stored a value of zero in the TOS field, indicating the TOS field was not being used.

• “In our data only 1.0 million IPv4 packets provide ECN capable transport (either one of the ECT bits set) and additionally 1.1 million packets actually show ’congestion experienced’ (both bits set). This means that ECN is implemented in only around 0.02% of the IPv4 traffic.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks29

3.2.1 IP type of service

• Medina et. al. found that the number of ECN-capable webservers went from 1.1% in 2000 to 2.1% in 2004.

• “... suggesting that the number of ECN-aware routers is still very small.”

• “Valid ’Pool 1’ DiffServ Codepoints (RFC 2474) account for 16.8% of all TOS fields.”– Different applications have different needs.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks30

Andy comments: It would be useful to find a more recent study on ECN and Differentiated Services.

3.2.2 IP Options

• Only 68 packets carried IP Options.– 68 out of 10.77 billion

• One 20-minute trace contained 45 packets with IP Option 7 (Record Route).

• Three traces had 12 packets with IP Option 148 (Router Alert).

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks31

“The analysis of IP options showed that they are virtually not used.”

3.2.3 IP fragmentation

• In 2000, McCreary et. al. found an increase in the fraction of packets carrying fragmented traffic from 0.03% to 0.15%.

• In 2002, Shannon et. al. found the fraction of packets to be 0.67%.

• “Contrary to this trend, we found a much smaller fraction of 0.06% of fragmented traffic in the analyzed data.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks32

3.2.3 IP fragmentation

• 63% of the outgoing fragmented traffic was IPsec ESP traffic (RFC4303), observed between exactly one source and one receiver.

• A fragment series comprised one full length Ethernet MTU followed by a 72 byte fragment.

• “This can easily be explained by an unsuitably configured host/VPN combination transmitting 1532 bytes (1572 – 40 bytes IP and TCP header) instead of the Ethernet MTU due to the additional ESP header.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks33

Encapsulating Security Payload (ESP)

3.2.4 IP flags

• 91.3% of the packets have the don´t fragment bit (D or DF bit) set, “as proposed by Path MTU Discovery (RFC 1191)”.

• 0.04% of the packets have the more fragments bit (M or MF bit) set.– fragmented traffic was 0.6%

• 8.65% of the packets use neither DF or MF.

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks34

3.2.4 IP flags• 27,474 IPv4 packets from 70 distinct IP sources had DF and

MF set simultaneously!– an invalid combination according to the IP specification (RFC

791)...

• “Looking at the traffic pattern and considering that UDP port 53 is used, it seems to be obvious that there is a DNS server using improper protocol stacks inside the Göteborg region.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks35

DNS“(1) Short for Domain Name System (or Service or Server), an Internet service that translates domain names into IP addresses.”“The DNS system is, in fact, its own network. If one DNS server doesn't know how to translate a particular domain name, it asks another one, and so on, until the correct IP address is returned.”http://www.webopedia.com/TERM/D/DNS.html

3.2.4 IP flags

• 233 packets from 126 distinct sources had the reserved bit set.

• “According to the IP standard (RFC 791) the reserved bit must be zero, so this behavior has to be regarded as misbehavior.”

8/25/2009NET0183 Networks and Communications

by Dr Andy Brooks36

Misbehaviour can be caused by bugs in software or by network attacks exploiting protocol vulnerabilities.