TOPOLOGY AWARE ESTIMATION METHODS FOR INTERNET TRAFFIC...

TOPOLOGY AWARE ESTIMATION METHODS FOR INTERNET TRAFFIC

CHARACTERISTICS

James A. Gast

A dissertation submitted in partial fulfillment of

the requirements for the degree of

Doctor of Philosophy

(Computer Sciences)

at the

UNIVERSITY OF WISCONSIN–MADISON

c© Copyright by James A. Gast 2003

To Anne, who gave up everything for me three times.

ACKNOWLEDGMENTS

First and foremost, this thesis would not have been possible without the patience and clear-

headed thinking of Paul Barford and the healthy skepticism of Larry Landweber. They listened

patiently when I questioned data that disagreed with my pre-conceptions and guided me to all the

right papers and textbooks at exactly the right moments.

As with any modern program, my thesis work stands on the shoulders of countless people who

wrote tools, languages, and packages that were indispensable. To name them all here would be

impossible, but I want single out Dave Plonka for his dedication to tools that made it easy for me

to collect and analyze traffic from Internet 2.

Over 3 decades, I have had the joy and honor of brainstorming with some of the best pro-

grammers and designers of open computer networking and none are better than the team at the

Wisconsin Advanced Internet Lab. I had many important and valuable conversations with De

Byrd, Joel Sommers, and Vinod Yegneswaran. I have immense gratitude to John Morgridge and

the other WAIL donors for their very generous donation of equipment to WAIL and the Badger

Internet Group.

The insight and all of the mathematics for the dynamic programming algorithm in the clustering

part of the thesis were the work of Dr. Jin-Yi Cai. He wrote that treatment in a single amazing

wonder-weekend and it did not have a single flaw.

Thomas Hangelbroek did the initial programming to determine the centroid of the global Inter-

net and showed me matlab tricks I had never imagined.

Important and very helpful comments came from Dr. Robin Kravets. Her insights into Internet

topology studies were both inspired and inspiring.

Finally, I especially want to thank Drs. David DeWitt and Jeff Naughton for their faith in me.

And I want to thank the CS faculty for granting me the Anthony C. Klug fellowship in Computer

Science.

DISCARD THIS PAGE

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Motivation and Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Successful Congestion Abatement . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Where Congestion Occurs . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.3 Gap Between Congestion Events . . . . . . . . . . . . . . . . . . . . . . . 51.1.4 New Models with the New Parameters . . . . . . . . . . . . . . . . . . . . 51.1.5 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.6 Scalable Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.7 A Matrix of Traffic Demands . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Topology of the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1 The Need for a Succinct Internet Graph . . . . . . . . . . . . . . . . . . . . . . . 132.2 Topologically-guided Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Client Demand Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.4 Cache Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.5 Evaluation of Cache Placement Impact . . . . . . . . . . . . . . . . . . . . . . . . 402.6 Incorporating Knowledge of AS Relationships . . . . . . . . . . . . . . . . . . . . 422.7 Clustering Study Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3 Large Scale Simulation of Congested Behaviors . . . . . . . . . . . . . . . . . . . . 51

3.1 Simulating Congestion and the Effect on Traffic . . . . . . . . . . . . . . . . . . . 523.2 Surveyor Data: Looking for Characteristics of Queuing . . . . . . . . . . . . . . . 57

3.3 Window Size Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.4 Congestion Events and Flock Formation . . . . . . . . . . . . . . . . . . . . . . . 633.5 Congestion Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.6 Simulation Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4 Traffic Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.1 Capturing and Simplifying Abilene Traffic . . . . . . . . . . . . . . . . . . . . . . 874.2 Populating the Traffic Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.3 Ramifications of Sender and Receiver Memory Settings . . . . . . . . . . . . . . . 994.4 Coalescing Traffic into Minimal Unique Set . . . . . . . . . . . . . . . . . . . . . 1114.5 Traffic Matrix Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.1 Topology Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.2 Backbone Delay and Loss Related Work . . . . . . . . . . . . . . . . . . . . . . . 1255.3 Related Work in Traffic Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . 128

LIST OF REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

DISCARD THIS PAGE

LIST OF TABLES

Table Page

2.1 Clusters Identified as Backbone by the Algorithm . . . . . . . . . . . . . . . . . . . . 24

2.2 Sample AS traceroute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1 Sample Link Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.2 Sample Flow Data Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.3 Traffic Matrix Flow Tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.4 Excerpt from Observed Traffic Matrix. Each entry is the volume of that flock in unitsnormalized to a total volume of 1000 unambiguous connections . . . . . . . . . . . . 98

4.5 Highest Volume AS Exits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

4.6 Achievable Bandwidth At 32 KByte Memory Limit, 1500 Byte Packets . . . . . . . . 114

4.7 Sample Assignment of AS Numbers to Equivalents . . . . . . . . . . . . . . . . . . . 116

4.8 Excerpt from Model Traffic Matrix Estimate . . . . . . . . . . . . . . . . . . . . . . 118

DISCARD THIS PAGE

LIST OF FIGURES

Figure Page

1.1 It is surprisingly difficult to predict the changes that result from a simple change inthe network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Walk-through of the clustering algorithm . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2 Results of AS cluster formation. The left graph shows how the number of clustersdeclines as clusters are coalesced. The right graph shows how the path length in thederived tree compares to the path length in the original graph of best paths. . . . . . . 23

2.3 Hops to the backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4 Demand aggregated to the 21 backbone nodes . . . . . . . . . . . . . . . . . . . . . . 28

2.5 Tadpole Graph Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.6 Performance versus random and greedy placement . . . . . . . . . . . . . . . . . . . 40

2.7 Early forest predicted only a tiny portion of the non-folded routes seen by traceroute. . 45

2.8 Adjusting the annotations in the graph reduced the number of folded (implausible)paths and improved prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.9 Results with final AS forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.1 Probability density of queuing delays of 5 paths . . . . . . . . . . . . . . . . . . . . . 58

3.2 Cumulative distribution of queuing delays experienced along the 5 paths. . . . . . . . 59

3.3 Probability density of queuing delays on 5 paths that share a long prefix with each other. 59

3.4 Showing the probability of losing 0, exactly 1, or more than one packet in a singlecongestion event as a function of cWnd. . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.5 Ingress Traffic in One Hop Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Figure Page

3.6 Queue Rise and Fall in One Hop Simulation . . . . . . . . . . . . . . . . . . . . . . . 64

3.7 Probability of a Given Queuing Delay in the One Hop Simulation . . . . . . . . . . . 66

3.8 Simulation layout for two-hop traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.9 Both signatures appear when queues of size 100 and 200 are used in a 2-hop path. . . 68

3.10 The distinctive signature of each queue shows up as a peak in the PDF. . . . . . . . . 69

3.11 Three hop simulation shows three distinct peaks . . . . . . . . . . . . . . . . . . . . 70

3.12 Simulation environment to foster window synchronization. . . . . . . . . . . . . . . . 71

3.13 Connections started at random times synchronize cWnd decline and buildup after 2seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.14 Connections with RTT slightly too long to join flock. . . . . . . . . . . . . . . . . . . 73

3.15 Proportion of time spent in each queue regime. . . . . . . . . . . . . . . . . . . . . . 74

3.16 Congestion Event Duration approaches reaction time. . . . . . . . . . . . . . . . . . . 77

3.17 As flocks at each RTT drop below cWnd 4, they lose much of their share of bandwidth. 77

3.18 Scalable Model Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.19 Finite State Machine for tracking the duration of congestion based on queue occupancy. 81

3.20 Queue regimes predicted by the congestion model . . . . . . . . . . . . . . . . . . . 82

4.1 Abilene Network Backbone, February 2003 . . . . . . . . . . . . . . . . . . . . . . . 88

4.2 Weather map of Abilene shows bits per second for each link averaged over 5 minutes . 89

4.3 Flight size graph shows one plus for each packet emitted by the sender. The 6 packetsin each round are not evenly spaced. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.4 Typical Stretch ACK Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.5 Typical Delayed ACK Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

AppendixFigure Page

4.6 Throughput to Selected Korean Destinations from Wisconsin . . . . . . . . . . . . . . 108

4.7 Throughput to Selected European Destinations from Wisconsin . . . . . . . . . . . . 109

TOPOLOGY AWARE ESTIMATION METHODS FOR INTERNET TRAFFIC

CHARACTERISTICS

James A. Gast

Under the supervision of Assistant Professor Paul Barford

At the University of Wisconsin-Madison

Attempts to represent the global Internet in simulations and emulations have been difficult even at

the most basic levels. The focus of our work is Internet topology and traffic matrix estimation to

predict accurately Internet capacity, utilization, and congestion. We describe a forest representation

of the topology of the Internet which improves on prior topologies by being more complete and

accurate. We present a novel, scalable simulation environment that models the interactions of col-

lections of flows across multi-hop networks and can accurately predict the way highly multiplexed

traffic will react to congestion. We show that round trip time and ceiling not caused by congestion

have a strong influence on the way traffic reacts to congestion. We show mechanisms that group

large numbers of connections into units we call flocks and demonstrate that flock behavior can be

seen in actual one-way delay data. Our model does not require packet-level information, but can

quickly map queue depths and predict multi-hop queuing delays. Using this model, we were able

to expose new phenomena that would not be apparent at lower levels of multiplexing.

The final component of this work is a traffic matrix estimation methodology that incorporates

those new parameters along with the volume of traffic for each full path through the network.

Ceiling and round trip time parameters were not used in earlier traffic matrix estimations because

it is difficult for an Internet Service Provider to collect that data. We present a novel technique for

inferring round trip times from easily gathered flow data at ISP edge nodes based on ACK ratio.

Paul Barford

ABSTRACT

Attempts to represent the global Internet in simulations and emulations have been difficult even

at the most basic levels. The focus of our work is Internet topology and traffic matrix estimation to