Realm Aware Source Routing in Practice -- Extending the

HELSINKI UNIVERSITY OF TECHNOLOGY

Department of Computer Science and Engineering

Telecommunications Software and Multimedia Laboratory

REALM AWARE SOURCEROUTING IN PRACTICE

Extending the Capabilities of IPv4Networking

Master’s Thesis

Samuel Korpi

Telecommunications Software and Multimedia Laboratory

Espoo 2007

HELSINKI UNIVERSITY OF ABSTRACT OFTECHNOLOGY MASTER’S THESISDepartment of Computer Science and EngineeringDegree Programme of Computer Science and Engineering

Author: Samuel Korpi

Title of thesis:Realm Aware Source Routing in Practice – Extendingthe Capabilities of IPv4 Networking

Date: Dec. 6, 2007 Pages: 14 + 93Professorship: Communications software Code: T-110Supervisor: Professor Antti Yla-JaaskiInstructor: Ph.D. Sasu Tarkoma

A major, and long discussed, demerit of the current Internet architectureis the insufficient IPv4 address space, IPv4 being one of the fundamentalprotocols used in the Internet.

As more and more devices become Internet-capable, the feasibility of staticIPv4 addressing (i.e., assigning each node a unique IP address even thoughthat particular node might not always be online) is reduced. CIDR allowsmore effective address allocation, DHCP the dynamic reallocation of theaddresses, and NAT enables address hiding and private networks. Onthe other hand, these techniques reduce the availability of the hosts inthe Internet. The fundamental point of IPv4 connectivity is that onecan identify a node by its IP address. However, if the address changesdynamically or is hidden, this connectivity model is broken.

In this thesis, several existing solutions to the issue are discussed: IPNL,IPv4+4, HIP and NUTSS, among others. In particular, we concentrate onone solution, Realm Aware Source Routing (RASR). A Proof-of-conceptimplementation of RASR is introduced and its feasibility analyzed andcompared with other solutions.

We will show that RASR has potential, though also some limitations. Inother words, it is not yet a complete solution. Further research on thetopic is encouraged.

Keywords: IPv4, IPv6, address space depletion, static addressing,dynamic addressing, NAT, RASR

Language: English

ii

TEKNILLINEN KORKEAKOULU DIPLOMITYON TIIVISTELMATietotekniikan osastoTietotekniikan koulutusohjelma

Tekija: Samuel Korpi

Tyon nimi:Realm Aware Source Routing in Practice – Extendingthe Capabilities of IPv4 Networking

Paivays: 6. joulukuuta 2007 Sivumaara: 14 + 93Professuuri: Tietoliikenneohjelmistot Koodi: T-110Tyon valvoja: Prof. Antti Yla-JaaskiTyon ohjaaja: FT Sasu Tarkoma

Eras kriittisimmista ja pisimmin keskustelluista heikkouksista nykyisessaInternet-arkkitehtuurissa on IPv4:n, yhden Internetin perusprotokollan,osoiteavaruuden riittamattomyys.

Internetiin liitettavien laitteiden maara kasvaa jatkuvasti, eika jokaisellelaitteelle enaa kannata antaa uniikkia kiinteaa osoitetta (ns. kiintea os-oitteistus), varsinkin kun monet laitteet ovat kiinni verkossa vain satun-naisesti. Osoitealueiden luokaton jako (CIDR) mahdollistaa osoiteavaru-uden hyodyntamisen paremmin, DHCP tarjoaa dynaamisen osoitteistuk-sen ja NAT osoitteiden piilottamisen ja yksityiset verkot. Toisaalta namamenetelmat pienentavat laitteiden nakyvyytta Internetissa. IPv4 perus-tuu sille, etta laite voidaan identifioida IP-osoitteensa perusteella. Mikalikuitenkin osoite vaihtuu dynaamisesti tai on piilotettu, ei identifiointi on-nistu, ja laitteiden saavutettavuus huononee.

Tassa tyossa esittelemme useita olemassaolevia ratkaisuja edella mainit-tuun ongelmaan, muiden muassa IPNL, IPv4+4, HIP seka NUTSS.Lisaksi keskitymme tarkemmin yhteen ratkaisuun, nimeltaan RealmAware Source Routing (RASR). Kaymme lapi esimerkkitoteutuksenRASR:lle ja analysoimme sen kayttokelpoisuutta seka vertaamme sitamuihin ratkaisuihin.

Tulemme nayttamaan, etta RASR:ssa on potentiaalia, mutta myos omatheikkoutensa. Se ei siis viela talla hetkella ole taydellinen ratkaisu, ja aihekaipaakin lisatutkimusta.

Avainsanat: IPv4, IPv6, osoiteavaruuden loppuminen, kiinteaosoitteistus, dynaaminen osoitteistus, NAT, RASR

Kieli: Englanti

iii

Acknowledgements

The writing of this thesis has been a long project. I would like to thank mysupervisor, Prof. Antti Yla-Jaaski, and my instructor, Ph.D. Sasu Tarkoma,for their patience and valuable feedback and help throughout the writingprocess. Furthermore, thanks to Mr. Mikael Latvala from the Nokia Re-search Center (NRC) for the technical assistance and equipment necessaryto complete the project.

One person deserving a very special thank you is my elementary schoolteacher, Mr. Ilmari Paukkunen, whose personal interest in computers andwillingness to teach others far surpassed what the job required. Through hisguidance I took my first steps in the realm of computer science.

I also want to thank my current employer, KPMG Finland, for providingaccess to their testbed network which allowed me to finalize the evaluationsection of the thesis.

And finally, thanks to my family and friends. You are the reason I am whereI am now. Without your support this thesis probably never would have seenthe light of day.

Espoo, 6th of December, 2007

Samuel Korpi

iv

v

Abbreviations and Acronyms

AId Association IDALG Application Level GatewayBDA Borrowed Destination AddressBGP Border Gateway ProtocolBSA Borrowed Source AddressccTLD Country Code Top Level DomainCIDR Classless Inter-Domain RoutingDHCP Dynamic Host Configuration ProtocolDNAT Destination NATDNS Domain Name SystemDUA Destination Universal AddressEHIP End Host IP addressEID Endpoint IdentifierEL Endpoint LabelFARA Forwarding directive, Association, and Rendezvous Ar-

chitectureFD Forwarding DirectivefDS FARA Directory ServiceFQDN Fully Qualified Domain NameFTP File Transfer ProtocolFW Firewall (also fw)GW Gateway (also gw)HI Host IdentifierHIP Host Identity ProtocolHTTP Hypertext Transfer ProtocolI-D Internet-DraftICE Interactive Connectivity EstablishmentICMP Internet Control Message ProtocolIETF Internet Engineering Task ForceIF Interstitial FunctionIP Internet ProtocolIPNL IP Next Layer

vi

ISP Internet Service ProviderLAN Local Area NetworkLF line feedLNA Layered Naming ArchitectureLSB Least Significant BitMIDCOM Middlebox CommunicationsMRIP Middle Realm IP addressNAPT Network Address Port TranslationNAT Network Address TranslatorNLA Network Layer AddressNCC Network Coordination CentreNSIS Next Steps in SignalingNSLP NSIS Signaling Layer ProtocolNUTSS NAT, URI, Tunnel, SIP and STUNTOID Object IDOS Operating SystemP2P Peer-to-PeerRASR Realm Aware Source RoutingRFC Request for CommentsRG Realm GatewayRI Rendezvous InformationRN Realm NumberRRA Root Realm AddressRRV Root Realm VisitedRSIP Realm Specific IPRSA-IP Realm Specific Address IPRSAP-IP Realm Specific Address and Port IPRTT Round-trip timeSIP Session Initiation ProtocolSID Service IdentifierSON Service Overlay NetworkSSH Secure ShellSTUN Simple Traversal of UDP Through NATsSTUNT Simple Traversal of UDP Through NATs and TCP tooSUA Source Universal AddressTCP Transmission Control ProtocolTLD Top Level DomainTTL time-to-liveTURN Traversal Using Relays around NATUA Universal AddressUDP User Datagram ProtocolULD User Level Descriptor

vii

UML User Mode LinuxURI Uniform Resource IdentifierVoIP Voice over IP

B byte; 8 bits (b)kB kilobyte; 1 024 (210) bytesMB megabyte; 1 024 (210) kilobytes, 1 048 576 (220) bytes

viii

Contents

Abbreviations and Acronyms v

1 Introduction 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . 1

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . 3

2 Existing Solutions 5

2.1 Current Best Practice – NAT + NAT Traversal . . . . . . . . 6

2.2 New IP Version – IPv6 . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Extending the Current Internet Architecture . . . . . . . . . . 12

2.3.1 IP Next Layer (IPNL) . . . . . . . . . . . . . . . . . . 12

2.3.2 IPv4+4 . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.3 Middlebox Communications (MIDCOM) . . . . . . . . 13

2.3.4 Next Steps in Signaling (NSIS) . . . . . . . . . . . . . 14

2.3.5 Plutarch . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.6 Realm Specific IP (RSIP) . . . . . . . . . . . . . . . . 16

2.3.7 Teredo . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Architectural Modifications – Separating Identity from Location 18

2.4.1 Forwarding directive, Association, and Rendezvous Ar-chitecture (FARA) . . . . . . . . . . . . . . . . . . . . 19

2.4.2 Host Identity Protocol (HIP) Architecture . . . . . . . 20

ix

2.4.3 The Nimrod Routing Architecture . . . . . . . . . . . . 21

2.4.4 NUTSS: NAT, URI, Tunnel, SIP and STUNT . . . . . 21

2.4.5 A Layered Naming Architecture for the Internet . . . . 22

2.5 Service-oriented Approach . . . . . . . . . . . . . . . . . . . . 24

3 Realm Aware Source Routing (RASR) 26

3.1 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Communications Overview . . . . . . . . . . . . . . . . . . . . 28

3.3 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 RASR Proof-of-concept Implementation 33

4.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Modifications to the Linux Kernel . . . . . . . . . . . . . . . . 34

4.2.1 Affected header files (.h) . . . . . . . . . . . . . . . . . 35

4.2.2 Affected code files (.c) . . . . . . . . . . . . . . . . . . 39

4.2.3 Using the Kernel Configuration System (Kconfig) . . . 42

4.3 RASR Support for Applications . . . . . . . . . . . . . . . . . 42

4.3.1 Wireshark . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.3.2 Ping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.3.3 GNU Netcat . . . . . . . . . . . . . . . . . . . . . . . . 44

4.3.4 OpenSSH . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.4 Testing and debugging . . . . . . . . . . . . . . . . . . . . . . 45

4.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.5.1 Limitations of RASR in General . . . . . . . . . . . . . 48

4.5.2 Limitations of Our RASR Implementation . . . . . . . 49

4.6 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Evaluation 54

5.1 Testbeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1.1 Testbed 1: Private Testbed, RASR on Embedded Devices 54

5.1.2 Testbed 2: A Connection Over the Internet . . . . . . 55

x

5.2 RASR Implementation Performance Metrics . . . . . . . . . . 56

5.2.1 Reachability Using Ping . . . . . . . . . . . . . . . . . 57

5.2.2 File Transfer Using Netcat . . . . . . . . . . . . . . . . 59

6 Discussion 61

6.1 RASR Implementation Performance Analysis . . . . . . . . . . 61

6.1.1 Reachability (Ping) . . . . . . . . . . . . . . . . . . . . 61

6.1.2 File Transfer (Netcat) . . . . . . . . . . . . . . . . . . 64

6.2 Solution Comparison . . . . . . . . . . . . . . . . . . . . . . . 66

6.3 The Future Internet – Based on the Current Architecture ora Completely New Design? . . . . . . . . . . . . . . . . . . . . 68

7 Conclusions and Future Work 69

Bibliography

Appendices

A RASR Communications Flow Chart 78

B RASR Implementation Flow Chart 79

C Linux Kernel Debugging using User Mode Linux 80

D The Socket Buffer (sk buff) Kernel Structure 82

E Compatibility Patch for the Original Ping 84

F RASR Testbeds – Routing Tables 86

G RASR Testbeds – Netfilter settings 88

H Reachability Test Results 90

xi

I Solution Comparison Table 93

xii

List of Tables

2.1 Hosts under ccTLDs within the RIPE NCC service region.2007. [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 The main components of the NUTSS architecture. . . . . . . . 22

2.3 The naming layers of LNA. . . . . . . . . . . . . . . . . . . . 23

2.4 The design principles of LNA. [4] . . . . . . . . . . . . . . . . 24

5.1 The ping test cases. . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 The ping test results. Values are RTTs in milliseconds (ms). . 58

5.3 The ping test results for test case E. Values are average RTTsin milliseconds (ms). . . . . . . . . . . . . . . . . . . . . . . . 58

5.4 The Netcat file transfer test cases. . . . . . . . . . . . . . . . . 59

5.5 The file transfer test results. Values are transfer times in sec-onds (s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.1 File transfer with RASR using netcat (150 ms send delay). . . 67

xiii

List of Figures

2.1 An example of a LAN connected to the Internet via a NAT box. 7

2.2 A fairly common network scenario with both NATted and pub-lic hosts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 An IPv4/IPv6 network scenario utilizing Teredo. . . . . . . . . 18

3.1 An example realm hierarchy for RASR. . . . . . . . . . . . . . 27

3.2 An example RASR communications scenario. . . . . . . . . . . 28

3.3 SUA IPv4 Option. Variable length. . . . . . . . . . . . . . . . 30

3.4 DUA IPv4 Option. Variable length. . . . . . . . . . . . . . . . 30

4.1 The file listing of /proc/sys/net/ipv4/ from a RASR enabledsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2 A screenshot from Wireshark with RASR support enabled. . . 43

4.3 A couple of RASR test scenarios involving virtual UML hosts. 47

4.4 The Universal Address (UA) construction illustrated. . . . . . 52

5.1 Testbed 1 – A private RASR testbed. . . . . . . . . . . . . . . 55

5.2 Testbed 2 – A connection over the Internet. . . . . . . . . . . 56

6.1 Normal ping vs. RASR ping on a private testbed. . . . . . . . 63

6.2 Normal ping vs. RASR ping over the Internet. . . . . . . . . . 64

6.3 Reachability test (ping) against www.tkk.fi (130.233.240.9). . . 65

6.4 Reachability test (ping) against www.uoa.gr (195.134.100.100). 66

xiv

Chapter 1

Introduction

1.1 Background and Motivation

The Internet1 – one of modern-day success stories. Quite different from whatthe early Internet pioneers envisioned, with more applications and differenttypes of terminal devices than you can possibly count. Still, the fundamen-tals have hardly changed. The specifications for the core Internet protocols,Internet Protocol (IP) [36], Transmission Control Protocol (TCP) [37], andUser Datagram Protocol (UDP) [34], among others, were published as Re-quest for Comments (RFC) documents2 in the early 1980s and are still inuse. More importantly, their core functionality has not changed throughoutthe years. It is a tribute to the ingenuity behind those protocols that theyhave been able to support the ever-growing network.

In the end, however, there are always bound to be limitations to what canbe achieved with existing technologies. We cannot predict the future and sothe limits we first come across often are such that never crossed the mindsof the original inventors. We have seen that happen with the conventionalmemory limit (i.e., 640 kB) for IBM compatible Personal Computers (PCs)3

1We use the term Internet, with capital ’I’, to refer to the global telecommunicationsnetwork utilizing the TCP/IP protocol suite. An internet, on the other hand, is a generalterm referring to a connected group of networks, not restricted to using TCP/IP.

2An RFC is a freely available informational document concerning Internet technolo-gies, published by the Internet Engineering Task Force (IETF). Some RFCs have reachedstandard status.

3The 640 kB memory limit traces back to the 16-bit Intel 8086 microprocessor, whichcould only address 1 MB of memory, part of which was reserved for the Operating System(OS). It was plenty in 1978, when the processor was introduced, but quickly became abottleneck.

1

CHAPTER 1. INTRODUCTION 2

and with the whole Y2K4 thing, just to mention a couple.

With Internet technologies, one such problem is the size of the address spaceof IPv4. For full connectivity, each node connected to the Internet has to havea unique identifier, an IP address. The limiting factor of the address spaceis the address length, which is defined to be 32 bits for IPv4. Taking certainreserved addresses into account, 32 bits will give about 3.7 billion availableaddresses, in theory [52]. This sounds like a very big number, but given thatthe addresses have to be uniquely allocated (i.e., you cannot simply choose arandom IP whenever you need one), and the fact that more and more devicesare being used to connect to the Internet, the limit is very real.

The address space depletion problem was realized already in the early 1990s,and since that time the search for feasible alternatives has been on [17]. Thefirst specification for a new IP version, IPv6, was published in 1995 as RFC1883, and later obsoleted by RFC 2460 in 1998. However, the adoption ofthe new protocol has been a lot slower than expected. An article by MarkWeiser, “Whatever happened to the next-generation Internet?” [52], analyzesthe situation in 2001. From the article, one compelling reason for the poorsuccess of the new IP version stands out: “The world is still nicely served byv4”. Even though the interest in IPv6 has grown since then [8], this claimstill holds as far as the common public is concerned: WWW, email, onlinegaming, VoIP, and other popular Internet applications work, which is enoughfor most [25].

IPv4 just by itself would probably not have survived this far. However, it isa flexible protocol, allowing quick fixes, such as Network Address Translator(NAT) [44], to be invented. NAT, first defined in 1994 [19], together withtechnologies like Dynamic Host Configuration Protocol (DHCP) [18] andClassless Inter-Domain Routing (CIDR) [24], has become the current bestpractice for pushing the limits of IPv4. We use the term quick fix becauseNAT does not really address the underlying problems of IPv4. Furthermore,NAT has problems of its own, which we will discuss further in Section 2.1.

The modern Internet is a commercial entity affected by business realities. Inother words, whichever solution can be taken into use with the least moneyand effort required, will be the most likely candidate to be widely accepted.Comparing NAT and IPv6 from this perspective, there can hardly be anydoubt about the winner.

4Year-2-kilos; Many computer systems used (and still use) only two digits to store theyear information (e.g., 97 to present 1997). At the turn of the millennium this was fearedto cause various problems in systems not prepared for it.


1.2 Problem Statement

The current Internet architecture cannot fully support the demands of thenetwork and its applications anymore. As more and more devices becomeInternet capable, the feasibility of static IPv4 addressing is reduced. CIDRallows more effective address allocation, DHCP the dynamic reallocation ofthe addresses, and NAT enables address hiding and private networks. Onthe other hand, these techniques reduce the availability of the hosts in theInternet. The fundamental point of IPv4 connectivity is that one can identifya node by its IP address. However, if the address changes dynamically or ishidden, this connectivity model is broken.

In the long term, more drastic changes will be required, like the completeadoption of IPv6. This, however, will not happen overnight. So, in thisthesis we will research ways to utilize the existing network infrastructureto the best of its ability in modern environments. Different solutions arediscussed, some widely in use, some hardly more than ideas.

One specific solution, Realm Aware Source Routing (RASR) [29], is takeninto closer inspection. It is compared with the other solution and its feasi-bility is discussed.

1.3 Scope

The focus of the thesis is on the network layer (i.e., IP layer). Issues likeTCP connectivity and mobility, while related to the problem statement, arenot extensively covered.

1.4 Organization of the Thesis

We begin by covering some of the existing solutions to the problem statementin Chapter 2.

Chapter 3 provides a description of RASR. This is followed by a detaileddescription of a proof-of-concept implementation of RASR in Chapter 4.

Chapter 5 begins with a description of the testbed networks used to test ourRASR implementation. The rest of the chapter introduces some test casesand the corresponding test results. In Chapter 6 we discuss the findings ofthe previous chapter.


Finally, in Chapter 7 we finish the thesis with some conclusions and ideas forfuture work.

Chapter 2

Existing Solutions

In this chapter, we will present some of the existing solutions to the problemstatement (see Section 1.2). The solutions are divided into five differentcategories:

1. The current best practice: NAT + NAT Traversal. This will be coveredin Section 2.1.

2. A completely new IP version, IPv6, meant to replace IPv4. See Sec-tion 2.2.

3. Solutions building on the existing Internet architecture. Within thiscategory, six different solutions will be discussed in Section 2.3. Com-mon to all of these solutions is that they do not aim to replace IPv4but instead build new functionality either on top of IPv4 or somehowin parallel so that IPv4 would still be supported as a part of the newarchitecture.

4. Solutions aiming to separate identity from location. Three solutionsbelonging to this category will be presented in Section 2.4.

5. Finally, quite a different idea, service-oriented approach, is presentedin Section 2.5.

5

CHAPTER 2. EXISTING SOLUTIONS 6

2.1 Current Best Practice – NAT + NAT

Traversal

Definitely the most common way of dealing with the IPv4 address spacedepletion problem is to use NAT. In the following we present a commonNAT scenario.

You have received a single public IP address from your InternetService Provider (ISP). Most likely it will be dynamically allo-cated, but a static IP address might also be a possibility – atleast for an extra fee.

The thing is, you would like to have more than one computerconnected to the Internet, but do not want to pay the price foran extra IP address. The way to do this is either configure oneof your computers to handle NAT functionality, or invest in aseparate NAT device.

Now you can build an internal network and use the NAT box (i.e.,the device handling the NAT functionality) as the router betweenyour network and the Internet. As your network will be hiddenby the NAT box, you are free to give addresses to the computersin your Local Area Network (LAN) as you like1. Figure 2.1 showsone such setup.

Let us take a closer look at how NAT works via a use case:

1. Host A in Figure 2.2 wants to contact server S on the Internet. Itbegins by sending a packet with source=A and destination=S to thedefault router R, which happens to be a NAT box.

2. The NAT box replaces the source address on the packet’s IP headerwith its own public IP address, stores information about the originalsource so that reply packets can be routed correctly, and sends thepacket forward. This is called Basic NAT. [44]

Now, Basic NAT works fine as long as only one host from the privatenetwork wants to connect to the Internet at a time2. But when host B,

1It is recommended, though, to use addresses from specific ranges reserved for privateuse. [38]

2Of course, if the NAT box has more than one public IP addresses, they can be usedto allow multiple simultaneous connections. [44]


Figure 2.1: An example of a LAN connected to the Internet via a NAT box.

for example, wants to connect to server S at the same time as host Adoes, the NAT box has to have some way of keeping the two differentconnections separate. This can be achieved by extending the NAT todo port translation in addition to the address translation. That is, theNAT box stores, in addition to the original source address, also thesource port (or some other connection identifier). It then replaces thisidentifier with a uniquely selected one to avoid collisions. This variantof NAT is called Network Address Port Translation (NAPT). [44]

3. Server S receives the packet and sends a reply with source=S anddestination=R, as it has no knowledge of the original source, hiddenbehind the NAT box.

4. The NAT box R receives the reply packet, checks its database and findsa match for that particular connection. It then replaces the destinationaddress, and destination port (if applicable), on the packet’s IP headerto A’s IP address (and destination port), and sends the packet to theinternal network.

5. Host A receives the reply packet.

In this direction (i.e., client behind NAT, server on the Internet), everythingworks, for most cases. However, some application protocols, like the File


Figure 2.2: A fairly common network scenario with both NATted and publichosts.

Transfer Protocol (FTP), transfer IP address and possibly TCP/UDP portinformation as part of the protocol payload [45]. During normal NAT op-eration, the packet headers are modified but the payload is left untouched,resulting in connection failure. To remedy the situation, application specificNAT functionality is necessary, implemented by Application Level Gateways(ALGs)3 [45]. In practice, all current NAT implementations contain a suit-able ALG for handling FTP connections.

Problems arise when the server is behind NAT, as without the connec-tion/address pairing that happens with the outgoing packet, the NAT boxhas no idea where to route the packets. NAPT takes this into account byallowing static mappings where a certain inbound port4 is associated with a

3Here the terminology is a bit inconsistent – both Application Level Gateway andApplication Layer Gateway are associated with the acronym ALG.

4Transport layer protocols (e.g., TCP and UDP) utilize port numbers to forward thepacket to the correct application. The same idea can be used to forward packets to differenthosts based on the port number.


specific host in the internal network [44], a process commonly known as portmapping.

For example, if there is a web server behind a NAT (e.g., host B in Fig-ure 2.2), ports 80 and 443 can be mapped to that specific host. This meansthat every packet destined to either ports 80 or 4435 would be automaticallyforwarded to the web server. A client outside the internal network, like hostC in Figure 2.2, could now retrieve pages from the web server simply byusing the NAT box as the destination.

A new class of Internet applications are the so called peer-to-peer (P2P)applications. Skype, a popular Voice over IP (VoIP)6 application and manyfile sharing applications (e.g., BitTorrent) belong to this category. With P2Psoftware, the traditional client-server architecture7 does not apply anymore.A P2P application usually can act both as a client and a server. Port mappingdoes not really work in this kind of a dynamic scenario, which brings us todifferent NAT Traversal solutions.

Here we will only consider one such solution, namely Simple Traversal ofUDP Through NATs (STUN) [42].

From the abstract of [42]: “STUN is a lightweight protocol that allows ap-plications to discover the presence and types of NATs and firewalls betweenthem and the public Internet”. In order to achieve this, a STUN client sendsBinding requests to a STUN server, located somewhere on the public In-ternet. Based on the replies, the aforementioned information can then bededuced. [42]

For example, let us once again consider the network scenario in Figure 2.2.Assume that host A is the STUN client, and host S the STUN server. Now,A sends a Binding request to S. In the reply message, S includes the sourceaddress/port pair from the request. A can then compare the address fromthe reply with its own address – if they differ, A is behind at least one NAT.In addition, A now has knowledge of a public address/port mapping that canbe used to connect to A from the public Internet. The NAT type, however,affects the usability of this mapping. [42]

The main strength of STUN is that it does not require the existing NAT boxesto be modified in any way. On the other hand, it relies on the availability of

5Ports 80 and 443 are well-known ports used by Hypertext Transfer Protocol (HTTP)and secure HTTP, respectively.

6VoIP is commonly referred to as Internet telephony.7Client-server architecture is the traditional connection model used on the Internet,

in which the connection parties are either clients (e.g., a web browser) requesting services(e.g., a web page), or servers (e.g., a web server) providing the requested services.


STUN servers.

STUN is currently being redefined as Session Traversal Utilities for (NAT) [39].The new definition is Work in Progress and published as an Internet-Draft(I-D)8.

Worth mentioning is also an extension of the new STUN protocol, calledTraversal Using Relays around NAT (TURN) [40]. It is Work in Progress,and only an Internet-Draft has been published this far.

In TURN, the TURN server (i.e., a STUN server implementing the TURNfunctionality) is acting as a relay between the TURN client (i.e., a STUNclient implementing the TURN functionality) and its peers. Looking backto our example scenario in Figure 2.2, where host A is the TURN clientand host S the TURN server, assume that host C is the peer trying tocommunicate with A. With TURN, all packets between A and C will berelayed through S. In practice this means that A will be communicatingwith the same outside host, S, even if the actual peer changes. This, in turn,makes it possible for the relay usage to function in scenarios where plainSTUN does not. Unfortunately, relaying takes up a lot of resources from theSTUN server, making the solution less attractive. [40]

2.2 New IP Version – IPv6

IPv6 [15], the new IP version meant to replace IPv4, was introduced in themid 1990s. Still now, over a decade later, the usage of IPv6 is extremelysmall compared to IPv4.

Already in Section 1.1 we introduced what we consider to be one of the mainreasons for the slow adoption of IPv6: IPv4 still works well enough, so whychange? But the hard numbers, showing what the situation really is today,are missing. We do not know exactly how many IPv4 vs. IPv6 hosts thereare on the Internet or what is the relation between IPv4 and IPv6 trafficflowing around. Nor could we, really. One can just make educated guesses,based on the available information (e.g., Border Gateway Protocol (BGP)routing tables and DNS data).

A complete IPv4 vs. IPv6 analysis is worth another thesis. Here we will givejust one comparison, using DNS data. Most of human interaction with theInternet is via DNS – IP addresses are just too complicated to remember.

8An Internet-Draft (I-D) is a working document of IETF or some other group, oftenconsidered as a precursor to an RFC. It has no formal status, and a limited lifetime.


So, it makes sense that a host count using DNS data would provide someuseful metrics.

RIPE9, the European10 Regional Internet Registry (RIR)11, provides suchdata via its Hostcount service, which is currently being replaced by Host-count++. Unfortunately, the old Hostcount service does not provide IPv6-related data, and the new service is still on beta level, so the informationhas to be taken with a lot of caution. Table 2.1 provides a summary of theearly Hostcount vs. Hostcount++ statistics, spanning June-October 2007 [2].Keeping in mind that the data might not be accurate, we can see withoutany doubt that IPv6 hosts still compose only a small fraction of all Internethosts. Of course, a lot of test networks etc. are missing from these statistics,but then again they are not important to a common Internet user.

Hostcount Hostcount++IPv4 IPv4 IPv6

June 2007 21,422,036 51,578,365 31,985July 2007 22,947,526 69,885,400 54,577August 2007 22,113,335 88,192,230 77,066September 2007 22,852,505 105,899,172 82,552October 2007 23,920,808 113,127,697 38,722

Table 2.1: Hosts under ccTLDs within the RIPE NCC service region.2007. [2]

The strength, and at the same time the weakness, of IPv6 is that it is not anextension of IPv4 but a completely rewritten protocol. The header format,for example, has changed drastically to facilitate the 128 bit address length(compare to the 32 bits in IPv4) [15]. What makes ISPs think twice beforeinvesting in IPv6 is the fact that IPv6 requires support from every networkelement (i.e., hosts and routers). It is possible to tunnel IPv6 inside IPv4 andthus travel through “IPv4 clouds” in the network [9], though problems arisewhen NAT is introduced into the picture. In Section 2.3.7 we will present anenhanced tunneling solution, Teredo [28], aiming to solve these issues.

In any case, adopting IPv6 requires a lot of effort, and as IPv4 still worksgood enough with NAT, ISPs are not in such a hurry to get the new protocolsupported.

9http://www.ripe.net/10Including the Middle East and parts of Central Asia.11An Internet Number Resources (e.g., IP addresses) coordinator.

http://www.ripe.net/


2.3 Extending the Current Internet Archi-

tecture

In this section we cover some proposed ways of extending the current Internetarchitecture. Common to these solutions is that they aspire to continue thesupport for IPv4 networking either directly or as a part of the new architec-ture. In other words, they take into account the reality that any changes toa network as large as the Internet will have to be incremental.

RASR, covered in Chapters 3 and 4, also belongs to this category.

2.3.1 IP Next Layer (IPNL)

To fully understand the idea behind the IP Next Layer (IPNL) and similarsolutions, the term address realm, or simply realm, as we will call it later on,has to be introduced. Realm, in this context, refers to a group of networks12

all sharing the same address space [50]. In the current Internet architecture,the public IPv4 network would be one realm, the IPv6 network one realmand private IPv4 networks realms as well. It is worth noting that the realmsmay be overlapping.

In the IPNL topology, there are two types of realms, a single, globally-addressed middle-realm corresponding to the public IPv4 network and manyprivate realms corresponding to the private IPv4 networks hidden behindNAT boxes. The routers connecting realms to each other are called nl-routers. [22]

IPNL specifies a new protocol layer between the network layer (i.e., IP) andthe transport layer (i.e., TCP/UDP). In practice, this will show as a newheader between the IP header and the transport protocol header. The newheader stores IPNL address information, among other things. IPNL ad-dresses extend the IPv4 addressing scheme to allow routing through multiplerealms. The address consists of two IPv4 addresses, a Middle Realm IPaddress (MRIP) and an End Host IP address (EHIP). EHIP specifies thelocation of the host in the private realm connected to the nl-router given byMRIP. To allow for multiple private realms to be connected to a certain nl-router, there is a third parameter called a Realm Number (RN) in the IPNLaddress. [22]

As can be deduced based on the previous paragraph, IPNL is an IPv4 specific

12Here, a group may contain one or more networks.


solution.

The IPNL-aware entities are hosts and NAT boxes. The latter functionas the nl-routers. Normal IPv4 routers inside a realm do not require anymodifications, as they only deal with the outermost header, the IPv4 header.

2.3.2 IPv4+4

IPv4+4, like IPNL, divides the Internet into address realms: a single publicrealm corresponding to the public IPv4 network and many private realms [50],just like with IPNL. The difference is in terminology, public realm vs. middle-realm and realm gateway (RG) vs. nl-router, and in the addressing.

IPNL addresses consist of three parts: MRIP, EHIP and RN. IPv4+4 ad-dresses, however, only have two parts: a public IPv4 address and a privateIPv4 address concatenated together [50]. This has the limitation of nativelysupporting only two levels of addressing hierarchy13. For example, a privaterealm inside another private realm would not be possible. In [50] it is pro-posed that a third level, level 0 could be added by separating part of thecurrent public IPv4 address space for this new level. This, however, wouldrequire global collaboration in order to work, and unnecessarily complicatethe realm gateway functionality.

On the protocol level, IPv4+4 adds a simple encapsulated header after the IPheader, containing the extra address information [50]. This addition shouldnot be viewed as a new protocol layer, as IPv4+4 needs the addressing infor-mation from the IP header as well. So, when talking about the 4+4 header,we actually refer to the IP header and the IPv4+4 addition as a whole [51].

2.3.3 Middlebox Communications (MIDCOM)

The idea behind the “Middlebox communication architecture and frame-work”, as described in RFC 3303 [46], is to reduce the intelligence required onthe middleboxes14 (e.g., the NAT boxes) by introducing trusted third parties.These third parties, called MIDCOM agents, make it possible for some ofthe application intelligence to be moved away from the middleboxes, thus re-ducing their load. Communications between the middleboxes and MIDCOMagents are handled via MIDCOM protocol.

13All private realms would have to be directly connected to the public realm.14In this context, the term middlebox refers to network nodes providing firewall (FW)

and NAT services. [46]


What makes MIDCOM relevant to our research is the extra processing ca-pability provided by MIDCOM agents. One way of traversing through NATboxes, already mentioned in Section 2.1, is with the help of ALGs. The taskof an ALG is to investigate incoming packets and perform necessary steps(e.g., modify the packet payload, if necessary) to assure correct applicationfunctionality over the NAT box [45]. This, however, requires a lot of in-telligence from the device providing the ALG functionality. Furthermore,the system has to be easy to update as most of the intelligence required isapplication specific. Traditionally the NAT box itself had to provide thisfunctionality, but with MIDCOM it can be outsourced.

The MIDCOM framework is extensible in nature [46]. Other possible usesfor it, however, are outside the scope of this thesis.

2.3.4 Next Steps in Signaling (NSIS)

All data flowing through a telecommunications network can be roughly cate-gorized as either the actual payload data or the signaling data supporting thecommunications. It is, however, not always easy to actually separate thesetwo from each other.

In the Internet, there is no separate signaling layer15. Signaling data is trans-ferred among payload data, sometimes as separate packets (e.g., InternetControl Message Protocol (ICMP) [35]), but most often within the packetscontaining actual payload data as well. IP routing is a good example of thiskind of embedded signaling.

NSIS is a general purpose signaling solution, aiming to provide a commonframework for the different signaling needs in the Internet. [27]

Within the scope of this thesis, we only consider one NSIS Signaling LayerProtocol (NSLP)16, the NAT/FW NSIS NSLP. It is currently under devel-opment (i.e. Work in Progress) by the NSIS Working Group of IETF, andso far published only as an Internet-Draft [49]. For simplicity, we will referto this NSLP as NATFW signaling from now on.

NATFW signaling can be used to dynamically configure compatible NATand firewall devices. For example, port mappings can be dynamically added

15In traditional circuit-switched telephone networks the signaling data (i.e., call setup)is separated from the payload data (i.e., voice). That is, they use different data paths,and are said to exist on different layers.

16The NSIS framework contains two protocol layers, a Transport Layer providingapplication-independent transport services for signaling messages and a Signaling Layercontaining the application-specific functionality. [27]


and removed to facilitate P2P applications. The main requirements are thatthe devices in question understand the NATFW signaling messages and allowthem to get through in the first place. In [49], many network scenarios arecovered, each with their own special requirements.

Roughly, the NATFW signaling process goes as follows: A host begins bysending a signaling message towards the destination host. Every firewalland NAT box along the route, assuming they are capable of interpreting themessage, configure themselves accordingly. Of course, in a realistic scenariothere are bound to be some middleboxes which do not support NATFWsignaling, making the situation more complex. [49]

An implementation and performance analysis of NATFW signaling is pre-sented in an article by N. Steinleitner et al. [48]

2.3.5 Plutarch

The solutions we have presented this far all regard the Internet as a homo-geneous entity, to an extent. The network layer (i.e., the IP layer) masks theunderlying differences of the physical networks, providing a common basisfor communications.

With Plutarch [13], the authors J. Crowcroft et al. take another position.They consider a heterogeneous global network, which we will refer to as theglobal internet within the scope of this thesis. The global internet can bedivided into contexts, each of which is a homogeneous network entity as faras addressing and naming services, among others, are concerned. In fact, thepublic IPv4 network would be considered as one context, as would the IPv6network17. NATted LANs would consitute a context each, etc. It is worthnoting, though, that a context might not utilize TCP/IP at all. [13]

The heterogeneous approach taken by Plutarch poses a new dilemma – howto handle communications between different contexts, when no common pro-tocol layer exists? The authors of Plutarch propose to solve this by theintroduction of interstitial functions (IF s), entities residing on the contextborders. Theirs would be the responsibility to make any necessary modifica-tions to the data flow, allowing it to pass context borders. Further complexityis added by the lack of a global naming/addressing scheme. In the currentInternet architecture, the functionality of IFs is handled by the NAT boxesand 6to4 routers, among others. [13]

You may have noticed the similarity between our use of the term context

17The IPv6 network is public by definition, as every node has a unique identifier.


to the term realm, defined in Section 2.3.1. For now, suffice to say thatrealm is a more specific term considering only addressing and mainly used inconjunction with IP networks.

2.3.6 Realm Specific IP (RSIP)

According to RFC 3102 [7], “RSIP is intended as a alternative to NAT inwhich the end-to-end integrity of packets is maintained”.

To illustrate how RSIP manages this, let us return to our NAT scenario inFigure 2.2. This scenario can easily be converted to RSIP: the NAT box Ris replaced by an RSIP Gateway, and host A is upgraded with RSIP clientfunctionality, converting it to a RSIP Host. Hosts C and S are public IPv4hosts. Host B is left as a legacy host, without RSIP support. To allow it toaccess the public Internet, the RSIP Gateway will continue providing NATfunctionality as well. [7]

Now, to our scenario. As the RSIP Host A only has a private IP address,communications between it and a public host (e.g. S) are not directly pos-sible. RSIP solves this by allowing the necessary resources (e.g., public IPaddress) to be leased from the RSIP Gateway. Depending on how many re-sources the RSIP Gateway has to spare, two modes of operation have beendefined: Realm Specific Address IP (RSA-IP) and Realm Specific Addressand Port IP (RSAP-IP). [7]

RSA-IP is the simpler of the two. Here, each RSIP host requesting resourceswill be allocated a unique public IP address [7]. This is almost analogue toa DHCP server allocating public addresses, as in both cases the host “owns”the IP address it received for a certain period of time. However, the problemis that this method requires several public addresses to be available, whichis usually not the case.

RSAP-IP adds port information to the lease (i.e., a host gets an address/portpair) [7]. This means many hosts can be served with just one public address.The downside is, of course, that the host will have to request more resourcesfor every new connection. The situation is a bit similar to a the port mappingdone in a NAT box, though more dynamic.

The actual packet transfer is done with the help of tunneling [7]. In our ex-ample, RSIP Host A would tunnel the packet destined to host S to the RSIPGateway R, which would strip the outer headers and forward the packet. Thereply packet would reverse the path, being first sent to R and then tunneledto A. It is worth noting that this method of tunneling the packets effectively


removes the need of mangling with the data packet itself thus preserving theend-to-end integrity.

2.3.7 Teredo

Though essentially an IPv6 specific solution, we will present Teredo [28]here, as it does not break legacy (i.e., IPv4) applications and as such can beconsidered an extension to the existing Internet architecture.

You may ask what purpose does an IPv6 specific solution to the NAT issuesserve – there should be no need for NAT once IPv6 is adopted, should there?The unfortunate fact, which should be clear by now, is that there can beno overnight shift to IPv6. That is, there will be a period when both IPv4and IPv6 are in use, and so are NATs. Actually, we are currently living thatperiod, and nobody really knows when it will end, if ever.

Teredo is an IPv6-in-IPv4 tunneling solution including NAT traversal ca-pabilities. The author, C. Huitema, however stresses that it should onlybe considered as a last resort solution, due to the connection overhead itinflicts [28].

Figure 2.3 illustrates one scenario, in which the usage of Teredo can be jus-tified. Here we have separated IPv4 and IPv6 Internet for clarity, thoughin practise they would be overlapping. In our scenario, let us assume thathost A, located behind a NAT, wants to contact an IPv6 host C, directlyconnected to the IPv6 Internet.

Plain IPv6 connection between the hosts is out of the guestion, as host Ahas no IPv6 connectivity18. 6to4 tunneling would require that the NAT boxwould handle the necessary 6to4 router functionality [9]. In our scenario thisis not the case, leaving Teredo as a viable alternative.

Teredo relies on the help of two external entities, a Teredo Server and aTeredo Relay [28]. The necessary functionality could actually be providedwith a single entity [28], but we will keep the distinction between the two toput emphasis on their different tasks.

Teredo Server handles a similar tasks as a STUN server, introduced in Sec-tion 2.1. It is used to discover the public IP of the NAT box, and to initiatea NAT mapping, so that the Teredo host behind NAT can be reached fromthe outside network. [28]

18With no IPv6 connectivity we refer to the absence of a valid IPv6 route out of theLAN, the host itself is assumed to be able to communicate via IPv6.


Figure 2.3: An IPv4/IPv6 network scenario utilizing Teredo.

Teredo Relay advertises the Teredo IPv6 prefix19, and relays the packetsbetween the communicating hosts, converting between IPv4 and IPv6 packetsas needed. A single Teredo Relay usually serves only a small number of hosts.The relay functionality can also be included in the end host itself. [28]

2.4 Architectural Modifications – Separating

Identity from Location

In the current Internet architecture, network nodes (i.e., hosts) are identifiedby their IP addresses. IP address is also used to provide the necessary loca-tion information for packet routing. For mobile hosts this means that boththeir identity and location information changes as the host moves. This is aproblem if the host in question should be accessible from the outside network.

19In order to be reachable by IPv6 hosts, the Teredo hosts need to have an IPv6 address.A specific Teredo prefix identifies the packet as a Teredo packet and directs it to an entitycapable of handling it.


Separating identity from location gives the advantage of always being able tocontact a specific host by using its identity. There just has to be some wayof matching the identity with current location to get the required routinginformation.

Domain Name System (DNS) is a naming system separate from IP address,and could theoretically be used to provide the identity for an Internet hostwhile IP address would continue to provide the location information. How-ever, the problem lies in keeping the DNS information continuously synchro-nized with the host’s location.

The solutions presented in this section define a couple of other methods forseparating the identity from location.

2.4.1 Forwarding directive, Association, and RendezvousArchitecture (FARA)

FARA is more of a generalized model than an actual architecture. TheFARA model, along with an example derivative of the model, M-FARA, aredescribed in an article written by D. Clark et al. [12]

In order to understand FARA, we must first introduce some new terminol-ogy [12]:

• Entity is the endpoint for communications. It may be an application,a physical host or even a cluster of computers with internal communi-cations invisible to FARA.

• Association identifies a communication between two FARA entities.An analogue is a transport-layer connection (e.g., TCP) in the currentInternet architecture. Multipoint communications are not defined forthe current FARA model.

• Association ID (AId) identifies an association. An analogue is thesource address/port pair for TCP connections. AIds are local to anentity.

• Communication Substrate is the underlying network infrastructuredealing with the actual routing and delivering of data.

• Forwarding Directive (FD) contains the instructions on how to finda particular entity. That is, it tells the location of that entity.


• Rendezvous Information (RI). When an entity wants to start com-munications with another entity, it will an RI string as a part of the firstpacket. The RI provides initial information necessary for the receiverto create a new association.

• FARA Directory Service (fDS) is the FARA analogue for DNS.It contains name→(FD,RI) mappings, though it is worth noting thatFARA does not demand a global naming scheme.

To sum up, FARA describes a model for a point-to-point communicationsarchitecture, where the identity and location information of a communicat-ing entity is clearly separated. FARA poses no restrictions on naming andaddressing schemes, making the model easily extensible. Routing is donefollowing the instructions in FDs, which might be dynamically updated on-route. [12]

2.4.2 Host Identity Protocol (HIP) Architecture

The Host Identity Protocol (HIP) and the associated new namespace pro-vide the basis for a new Internet architecture, where the location and identityinformation is separated. In this architecture, hosts are identified by crypto-graphic identifiers, the Host Identifiers (HIs). The location information, onthe other hand, would still be provided by IP addresses. [32]

The new identifiers cannot be handled by the existing architecture. Forthis reason, a new protocol layer, the HIP layer, is introduced between thenetwork layer (e.g., IP) and the transport layer (e.g., TCP). This new layerprovides the required extra functionality, and also effectively decouples thenetwork and transport layers, allowing them to evolve independently. Whileit is true that layering should allow that by definition, in the current Internetthe network and transport layers are far too close to each other20. [32]

NAT still poses a problem for the HIP architecture. An IP address by itselfdoes not provide sufficient information for locating a host in a private NATtednetwork. Fortunately, HIP provides us with a globally unique identifier,which can be used to resolve the actual location. This, however, requiressome support from the NAT box, meaning they would have to be modified.

20For example TCP, a transport layer protocol, uses IP address as a part of the segmentchecksum effectively chaining it to a specific network layer protocol. [37]


2.4.3 The Nimrod Routing Architecture

The Nimrod Routing Architecture differs from the other solutions we havepresented here by introducing a completely new routing system. [10]

The two entities within the Nimrod architecture worth noting are endpointsand nodes. According to RFC 1992 [10], “An endpoint represents a user ofthe internetwork layer: for example, a transport connection.”. This definitionis a bit contradicting, as it would appear the connection itself is an endpoint.A more reasonable interpretation, and the one we are going to use, would bethat the endpoint is actually the endpoint of a connection. In other words,an endpoint is a user (e.g., an application) of the network services.

Each endpoint is associated with one or more globally unique endpoint iden-tifiers (EID). Each EID might then be associated with a human-readableendpoint label (EL), not unlike DNS names in the current Internet architec-ture. [10]

A node is a mapping from the Nimrod architecture to the physical network.It can represent a single host, group of hosts, or even a single process insidea host. [10]

Within the Nimrod architecture, the location information is provided bylocators. For nodes, the locator also functions as an identifier. Furthermore,a node can only have one locator. Endpoints, on the other hand, might beassigned multiple locators. [10]

2.4.4 NUTSS: NAT, URI, Tunnel, SIP and STUNT

The NUTSS proposal basically collects together a bunch of existing solutionsand integrates them together to form a new architecture. The abbreviationNUTSS stands for its main components, some of which we have alreadycovered before. A list of the main components, with short descriptions, isprovided in Table 2.2. [26]

The NUTSS proposal heavily relies on the use of SIP. SIP URIs, in the formof user@domain21 provide globally unique identifiers in the new architec-ture. [21]

Interactive Connectivity Establishment (ICE), an extension of SIP, deals withthe more complex network scenarios involving NAT boxes. ICE uses STUN

21A SIP URI can actually contain a lot more information than the simple variation usedhere [41]. In the context of identifying an endpoint in the Internet, this simple form isquite sufficient, though.


NAT Network Address Translator. The current de factomethod for extending the IPv4 address space. SeeSection 2.1.

URI Uniform Resource Identifier. By definition, “URI isa compact sequence of characters that identifies anabstract or physical resource” [6].

Tunnel Tunneling is necessary to allow certain protocols topass through NAT boxes unhindered [21].

SIP Session Initiation Protocol. Internet hosts can usethis protocol for discovery (i.e., finding each other)and session initiation (i.e., discussing the session pa-rameters etc) [41].

STUNT Simple Traversal of UDP Through NATs and TCPtoo. An extension to STUN (see Section 2.1) thatallows full TCP connectivity through NAT boxes.

Table 2.2: The main components of the NUTSS architecture.

and its relay usage to handle UDP and even TCP connections (to an extent)through NAT. Unfortunately the TCP support is not complete, and thusthe authors of NUTSS have proposed their own variation, STUNT. The ideabehind STUNT is similar to that of STUN, but utilizes SIP to transfer someof the information. [26]

2.4.5 A Layered Naming Architecture for the Internet

For simplicity, we will use an abbreviation LNA to refer to this proposal, “ALayered Naming Architecture for the Internet”, by H. Balakrishnan et al.

LNA goes further than just separating the identity and location information:it suggests a naming structure22 with four layers. The layers are listed inTable 2.3. [4]

By proposing that services and content in the Internet should have their ownunique identifiers, the authors are slightly moving towards a service-orientedapproach, a new way of looking at the Internet communications. We willdiscuss this approach further in the next section, Section 2.5.

LNA follows four main principles, which are listed in Table 2.4. They are

22Here we use the term naming structure to include also location specific information(i.e., IP addresses).


ULD User Level Descriptor. Human-readable identifying in-formation about a service/content (e.g., an email ad-dress).

SID Service Identifier. A globally unique identifier mappingto a service/content.

EID Endpoint Identifier. A globally unique identifier map-ping to a host.

NLA Network Layer Address. The location information map-ping to a physical host. IP addresses in the current In-ternet, though the architecture allows this to change.Note: the term NLA is our addition, it is not used bythe authors of LNA.

Table 2.3: The naming layers of LNA.

direct quotations from [4]. We consider these principles to be worth keepingin mind, no matter which path the future Internet might take.

The first design principle is fulfilled by the introduction of the new multi-layer naming structure. Names on each layer simple associate on the entitieson that layer, not crossing the layer boundaries. [4]

Another noticeable difference to the other proposals we have been coveringis the usage of flat names in LNA. A flat name in this context refers to aname without any structure or semantics about the entity (e.g., a service or ahost) associated with that name. This has some inherent difficulties, relatedto the implementation of the naming service, among others. However, it doesfill the requirements of the second design principle. The usage of distributedhash tables (DHT) has been proposed to help with the issues. [4]

Also, flat names in general are not human-readable. This is the main pur-pose of the ULD naming layer, to provide meaningful starting point for thequeries - nobody could remember a SID when looking for something on theInternet. [4]

The third and fourth design principles refer to the ability to introducingintermediaries on the route of communications. These are roughly analogousto the middleboxes (e.g., NAT boxes) in the current Internet architecture.However, by following the design principles and allowing the use of delegatesand destination sequences, the middleboxes can be better integrated to thenetwork. [4]


Principle #1: Names should bind protocols only to the rel-evant aspects of the underlying structure; binding protocolsto irrelevant details unnecessarily limits flexibility and func-tionality.

Principle #2: Names, if they are to be persistent, shouldnot impose arbitrary restrictions on the elements to whichthey refer.

Principle #3: A network entity should be able to directresolutions of its name not only to its own location, but alsoto the locations or names of chosen delegates.

Principle #4: Destinations, as specified by sources and alsoby the resolution of SIDs and EIDs, should be generalizableto sequences of destinations.

Table 2.4: The design principles of LNA. [4]

2.5 Service-oriented Approach

A somewhat different approach is to completely throw away the traditionalclient-server ideology, and even the more dynamic P2P way of thinking. In-stead, we look at the Internet as a way of connecting services and the usersof services. A service can be provided by a single node in the network, likein the traditional client-server architecture, or there can be a group of nodesproviding the service. We will be referring to such a group as a service group.The main point is that from the user’s point of view this is completely trans-parent (i.e., the user does not need to know exactly who is providing theservice, just that the service exists).

The strength of service-oriented approach is the flexibility it offers: if thereare many nodes providing a certain service, the load can automatically bedivided between the nodes. Furthermore, nodes can join and leave the servicegroup without disrupting the service.

Of course, the network layer would still be involved in sending packets be-tween hosts, using IPv4 or IPv6 or even some completely different set ofprotocols. There would just be an abstraction layer hiding the network func-tionality from the end users.


Here we will consider one such proposal, the service overlay network (SON)architecture [11].

In the SON architecture, the network layer is called data transport plane andthe abstraction layer dealing with services is called service plane. The serviceplane consists of service overlay networks (SON), which we will refer to simplyas service networks. Two new address types are introduced, a Service ID(SID) and an Object ID (OID). The former identifies a particular servicenetwork, and the latter a specific object within that network. A servicenetwork may dynamically map an OID to any host capable of providing therequested service, thus making the aforementioned flexibility possible. [11]

Two types of special hosts in the SON architecture are worth mentioning.One is a Service Gateway (SG), the interface point between the data trans-port plane and the service plane. The other is a Service Point-of-Presence(S-PoP), a gateway to a service network. Packet routing is first handled bySGs on the basis of the SID associated with that packet. Later, as the packetarrives to the S-PoP of the correct service network, it will be forwarded ac-cording to the OID to its final destination. [11]

Chapter 3

Realm Aware Source Routing(RASR)

Realm Aware Source Routing (RASR) was invented by Mikael Latvala at theNokia Research Center, in Helsinki, Finland. It is a solution closely related toIPNL and IPv4+4, which we covered in Sections 2.3.1 and 2.3.2, respectively.

In this chapter, we will give a basic introduction to RASR, from a theoreticalpoint of view. The next chapter, Chapter 4, will provide a more detailed viewof one way to actually implement the RASR functionality.

3.1 Topology

Like IPNL and IPv4+4, RASR too deals with address realms. Again, thereare different types of realms: a single root realm, corresponding to the publicIPv4 network, and then all the other realms. Contrary to IPNL, RASR doesnot limit the network layer protocol to IPv4 – an IPv6 realm is possible, asare other types of realms, within certain limits (see Section 4.5.1). [29]

The best way to think of the RASR topology is to imagine the whole Internetas a tree with the root realm on top and all the other realms branching downfrom that root [29]. It is not necessarily a pure tree, though, as some realmsmight have direct connections, not only through the root realm. An examplestructure is shown in Figure 3.1.

One example of directly connected realms breaking the tree-topology is shownin the picture. It is marked with a dotted line between realms R1 and R2. Itcould be a dual stack host having both IPv4 and IPv6 connectivity, the former

26

CHAPTER 3. REALM AWARE SOURCE ROUTING (RASR) 27

Figure 3.1: An example realm hierarchy for RASR.

through a NAT box, or a Realm Gateway (RG)1 in the RASR architecture,and the latter through an IPv6 gateway. This type of scenario, however, isoutside the scope of this thesis.

RASR addressing is based on this hierarchical, tree-like topology. A RASRaddress, called Universal Address (UA) can be built simply by following thepath from the root realm all the way to the end-host and concatenating theaddresses of the routers connecting the realms (i.e., the RGs). The end-host’saddress would naturally form the last part of the UA. By tracing the route tothe end-host starting from the root realm you end up with a unique addresscontaining all the necessary routing information. [29]

There may be more than one possible route between two hosts. This is calledmultihoming. It adds redundancy to the network, allowing uninterruptedconnectivity during link failures, to an extent. In the case of RASR, as theaddresses themselves contain routing information, multihoming comes downto a single host having multiple addresses (UAs). This becomes a problem,should an RG mentioned in the UA become unreachable for some reason.The previous RG cannot simply reroute the packet around the missing RG,as it has no knowledge of alternative routes. Instead, it will inform the senderabout the problem. It is the task of the sender to then use an alternative UA

1In this thesis, we are using the term Realm Gateway (RG) instead of the original termInterstitial Function (IF) used by M. Latvala in [29], as it better describes the networkelement in question.


to retransmit the packet.

3.2 Communications Overview

In this section we will give an overview on how communications actually arehandled within the RASR architecture. The general case is presented in theform of a flow chart, which you can find in Appendix A. In order to keep thesize and complexity of the flow chart within reason, we have imposed somerestrictions:

1. An RG cannot be an endpoint for communications (i.e., the sender orthe recipient).

2. The sender and recipient cannot reside under the same root RG.

3. In each realm the default gateway (gw) is the RG closest to the rootrealm.

We illustrate the communication process with a simple example. Figure 3.2shows our example scenario.

Figure 3.2: An example RASR communications scenario.

Now, imagine that host A in realm R2 wants to communicate with hostB in realm R3 . The first step is to resolve B’s UA, which we will refer


to as the Destination UA (DUA). In [29] M. Latvala, the author of RASR,proposes to use DNS for this.

From the DUA, it is easy to parse the Root Realm Address (RRA). RRA is thepublic IPv4 address of the root RG closest to the destination host. Withinthe RASR architecture, this address is unique and as such it can be used asthe destination address when the packet travels up the realm hierarchy, inour example through R1 to the root realm . However, if some realm onroute to the root realm does not support IPv4 addressing, the destinationaddress has to be temporarily replaced with an address from that addressrealm2. Also, a flag3 has to be set to mark the change of address. [29]

In our example we only have IPv4 realms, so we do not need to be concernedabout any address changes. So, host A sends a packet with destination setto RRA and source set to A towards the root realm. The complete DUAis stored within the packet as well. Furthermore, host A adds an emptytemplate for the Source UA (SUA). The SUA will be completed on route sothat by the time the packet reaches its destination, it will contain not onlythe complete DUA but a complete SUA for sending replies.

Normal IP routing applies within all the realms, and because the destinationlies outside the current address realm, the packet will be passed to the defaultgw. In our example, this gw is the RG connecting realms R1 and R2 .

The RG receives the packet and checks the Root Realm Visited (RRV) flag.As the packet has not yet passed through the root realm, RRV is not set.Now, the RG will add the source address to the SUA, and forward the packetthrough the next realm, R1 .

The next host to receive the packet is the RG between the root realm andthe realm R1 . It will process the packet exactly the way the previous RGdid, and pass it through the root realm.

Now the RG between the root realm and realm R3 receives the packet.It is the recipient of the packet, so the processing is slightly different thanbefore: the RRV flag is set, the address of the next step is parsed and usedas the destination, the DUA pointer is incremented, and the packet is sentto realm R3 .

Host B finally receives the packet, and we check that it really is the finalrecipient. For the replies, the roles of SUA and DUA are swapped. In otherwords, the SUA from the received packet is used as the DUA for sendingreplies. The DUA is not necessarily needed, but it can be used to optimize

2In the flow chart, default gateway (gw) address is used, as it is easy to find out.3The flag in question is called Borrowed Destination Address (BDA) flag.


the communications process for certain scenarios. More details about thiscan be found in Section 4.6.

3.3 Protocol

In contrast to both IPNL and IPv4+4, RASR does not require adding anew packet header. Instead it uses the extension mechanisms of the existingnetwork layer protocols [29]. Motivation for this is to try and make as fewchanges to the existing Internet architecture and protocols as possible, thusfacilitating easier adaptation of the proposal.

For IPv4, which we concentrate on, IP Options provide the aforementionedextension mechanism. RASR defines two new IP Options, SUA Option andDUA Option. Figure 3.3 presents the structure of the SUA option. The samefor the DUA option is shown in Figure 3.4. [29]

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Type | Length | Flags | UA... |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| ...UA... |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 3.3: SUA IPv4 Option. Variable length.

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Type | Length | Pointer | Flags |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| UA... |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 3.4: DUA IPv4 Option. Variable length.

The SUA and DUA options define the sender and the receiver of the packet,respectively, using RASR addresses – UAs. RGs use these options to routethe packets. They also modify the network level source and destinationaddresses as needed, much like NAT boxes do nowadays, to allow the packetto travel through the next realm on its route. [29]


The IP specification [36] defines two cases for IP Options:

1. The option contains a single octet4 of option-type.

2. The option contains an option-type octet, an option-length octet, andthe actual option-data octets.

The SUA and DUA options fall to the second category. The Type fieldcorresponds to the option-type octet, the Length field to the option-lengthoctet and the rest to option-data.

Following the IP specification [36], the Type field consists of three parts: a 1bit copied flag followed by a 2 bit option class and ending with a 5 bit optionnumber. For RASR, the copied flag is set, requiring the option to be presentin every packet sent, should the datagram be fragmented. The option classis set to 0, the control class.

For SUA option, the option number is set to 14. Putting the three valuestogether, we get bit string 1 00 01110 as the SUA option type. In the caseof the DUA option, we use 15 as the option number. This affects the LeastSignificant Bit (LSB) in the option type field, giving 1 00 01111 as theresult. Later we will use the shorter Hex format, in which the SUA optiontype is 0x8E and the DUA option type is 0x8F.

The Length field should be self-explanatory.

The next field in the DUA option, the Pointer field, does not appear in theSUA option. Its function is to keep track of where the packet is (i.e., in whichrealm), on the way from the root realm towards the destination. With thehelp of the pointer, the address of the next RG (or the final destination) canbe resolved. [29]

SUA option only defines one flag5, the Borrowed Source Address (BSA) bit.DUA option also defines a similar flag, the Borrowed Destination Address(BDA) bit. Normally, when traversing the RASR realm hierarchy towardsthe root realm, the destination address is the RRA. The BDA and BSA flagsare used when the packet passes through realms in which the root realmaddress is not valid. [29]

There is also another flag defined in the DUA option, the Root Realm Visited(RRV) bit. This is used to mark whether the packet is traveling towards

4An octet is a group of 8 bits. We use it when talking about protocol fields etc. A byte,on the other hand, is a measure of length in our context.

5The Flags field is an octet, so there is space for extension.


the root realm (RRV not set) or from the root realm towards the destination(RRV set). [29]

Chapter 4

RASR Proof-of-conceptImplementation

In this chapter we present our implementation of the RASR functionality.We begin by describing our development environment, both the hardwareand the software, in Section 4.1.

Our choice for the software platform was Linux. In Section 4.2 we explainhow to add the RASR support into the Linux kernel. In addition to thekernel modifications, we also need applications that support RASR. This isthe subject of Section 4.3.

The rest of the chapter is spent discussing our testing and debugging practices(Section 4.4), the limitations RASR and specifically our implementation has(Section 4.5), and finally some possible optimization methods (Section 4.6).

4.1 Environment

We began the implementation project at the Nokia Research Center, inHelsinki, Finland, during the summer of 2006. The software platform waspredetermined to be Linux. Available hardware consisted of a couple of work-station computers (x86-641), a laptop (x862), Buffalo WHR-G54S WLANrouters and Nokia N770 Internet Tablets. The idea was to do the develop-ment work on the PCs and port the implementation to the other devices.We did succeed in porting RASR to the WLAN routers, but left the N770s

1We use x86-64 to refer to a 64 bit Intel x86 compatible processor.2We use x86 to refer to a 32 bit Intel x86 compatible processor.

33

CHAPTER 4. RASR PROOF-OF-CONCEPT IMPLEMENTATION 34

as future work.

Possible reasons behind the decision of doing the implementation in Linuxenvironment include:

• The Linux kernel code is open source3.

• Linux kernel development is extensively covered in literature.

• There is a large, and active, community behind the Linux kernel, sogetting help is fairly easy.

• It is possible to run Linux on several embedded devices, includingWLAN routers (e.g., Buffalo WHR-G54S).

• The N770 Internet Tablet is Linux-based.

4.2 Modifications to the Linux Kernel

We have taken as our reference kernel the official ’vanilla’ kernel4. Newkernel releases (2.6.x) come out every three months or so with smaller sub-releases (2.6.x.y) in between. We have done most of the RASR developmentwith 2.6.17.y and 2.6.18.y kernels. Luckily, the TCP/IP networking code hasnot changed much, making it possible to upgrade the kernel with reasonableeffort. Porting the code to a 2.4 series kernel was also necessary to add RASRsupport for the Buffalo WLAN router. For the rest of this section, however,we will concentrate on the 2.6 kernel series. Specifically, if not mentionedotherwise, we will be referring to the 2.6.18 kernel.

In this section we cover the necessary modifications to add RASR supportinto the Linux kernel. We will not go into details, though, but instead pointout which files in the kernel code were modified, and describe the changes ona more abstract level.

We implemented RASR as a built-in functionality of the kernel. Anotherpossibility would have been to use kernel modules and netfilter hooks5.

3For an interesting, though slightly off-topic discussion concerning Free Software vs.Open Source semantics, take a look at R. Stallman’s essay on the subject [47].

4The ’vanilla’ kernel releases are available for download at http://www.eu.kernel.org/.5From http://www.netfilter.org/: “netfilter is a set of hooks inside the Linux kernel that

allows kernel modules to register callback functions with the network stack. A registeredcallback function is then called back for every packet that traverses the respective hookwithin the network stack.”

http://www.eu.kernel.org/

http://www.netfilter.org/


A flow chart in Appendix B provides an overview on how packets travelthrough the TCP/IP stack of a RASR-enabled Linux kernel. The packetflow on the sending host has been omitted, as there really is not that muchinteresting RASR functionality there. The modifications that have been doneon the sender side are mostly in the file net/ipv4/ip_output.c, with somesmall changes in files net/ipv4/icmp.c and net/ipv4/tcp_ipv4.c. Thesefiles and their contents are discussed further in Section 4.2.2.

In our implementation, we have relied on the use of RAW sockets. Thismeans that most of the sender side logic (e.g., parsing the UA and creatingthe RASR options) is handled in user space, before passing the data to thenetworking stack. It was easy enough to implement, but has its shortcomings,making it infeasible for production use. This issue is further discussed inSection 4.5.2.

4.2.1 Affected header files (.h)

The Linux kernel, being mostly C code, uses header files containing the dec-larations of exported functions (i.e., functions that are intended to be calledoutside from their respective source files), variables and constants etc. Thereis also a fairly large amount of actual program code embedded in the kernelheader files, sometimes making it quite difficult to follow the kernel instruc-tion flow.

In this section we discuss the modifications to the header files related toRASR, one file at a time. The code snippets you see here are from the actualRASR implementation patch. They are not complete, but should help inunderstanding the structure of our implementation.

include/net/ip.h

@@ -182,6 +184,12 @@

This type of line appears here and there, informing the actual location of themodifications inside a file.

/* From ip_output.c */

extern int sysctl_ip_dynaddr;


+#ifdef CONFIG_NET_RASR

+extern int sysctl_rasr_enabled;

+extern int sysctl_rasr_encapsulate;

+extern int sysctl_force_encapsulate;

+#endif

+

extern void ipfrag_init(void);

The lines with the plus sign (’+’) are the actual modifications6, other linesare just for reference. In this snippet we have an ifdef-block, which con-trols whether the code contained in the block is actually compiled into thekernel or not. The decision depends on the argument, in this case CON-FIG NET RASR, which is an example of a kernel configuration variable.We will talk more about the kernel configuration in Section 4.2.3. This par-ticular block defines a couple of sysctl variables, which can be used to controlthe RASR functionality somewhat on a running system. More details onsysctl can be found in Section 4.2.2.

#ifdef CONFIG_INET

@@ -347,7 +355,23 @@

extern void ip_forward_options(struct sk_buff *skb);

extern int ip_options_rcv_srr(struct sk_buff *skb);


+extern void ip_options_build_alloc_extra_mem(struct sk_buff *skb, \

struct ip_options *opt, u32 daddr, struct rtable *rt, int is_frag, \

int to_alloc);

+extern int ip_options_rcv_rasr(struct sk_buff *skb);

+extern struct ip_options *hop_over_root_rg(struct ip_options *opt, \

int *opt_changed);

+extern int checksum_recal(struct sk_buff *skb);

+extern __u32 getrootaddr(struct ip_options *opt);

+extern __u32 getdaddr(struct ip_options *opt);

+#endif

+

/*

+ * Functions provided by ip_output.c

6Additions, to be exact. When something is removed from the original file, or replacedwith something, the minus sign (’-’) is used.


+ */

+#ifdef CONFIG_NET_RASR_ENCAPSULATE

+extern struct sk_buff * rasr_encapsulate(struct sk_buff *skb);

+#endif

+

+/*

* Functions provided by ip_sockglue.c

*/

The exported RASR functions are introduced here. There are seven of them,most of which are defined in net/ipv4/ip_options.c. The only exceptionis rasr encapsulate, which is defined in net/ipv4/ip_output.c. These func-tions are further described in Section 4.2.2. The backslash in the end of aline marks the continuation of that line onto the next. It is used here forpresentation purposes, and does not appear on the actual header file.

One other thing worth noting is the parameter struct sk_buff *skb com-mon to four of the seven RASR functions. The data structure, defined ininclude/linux/skbuff.h, is one of the most central parts of the wholeLinux TCP/IP stack. It contains the outgoing or incoming data packet asthe packet moves through the networking stack. It is used by several pro-tocol layers, thus making the structure quite large. We have attached thedefinition of sk_buff as Appendix D. For a more detailed explanation of thesk_buff structure, we encourage you to take a look at Section 2.1 in [5].

include/net/inet sock.h

@@ -42,6 +47,10 @@

unsigned char srr;

unsigned char rr;

unsigned char ts;

+ #ifdef CONFIG_NET_RASR

+ unsigned char sua;

+ unsigned char dua;

+ #endif

unsigned char is_setbyuser:1,

is_data:1,

is_strictroute:1,

This code snippet is a part of the struct ip_options, a kernel data structureused to store the IP Options of a IPv4 packet. Here we add pointer variablessua and dua for the SUA and DUA options, respectively. When the options


are present, these variables are used to locate the actual options. The actuallocation might be within the ip_options structure, or the sk_buff struc-ture (i.e., with the packet headers). This is marked by the is data variable.The circumstances affecting the location are outside the scope of this thesis.However, it should be noted that the ip_options structure is only meant fortemporary storage. At some point the options have to be stored within thesk_buff structure as they are part of the IP header.

include/linux/sysctl.h

@@ -411,6 +411,9 @@

NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115,

NET_TCP_DMA_COPYBREAK=116,

NET_TCP_SLOW_START_AFTER_IDLE=117,

+ NET_RASR_ENABLED=118, // for RASR support

+ NET_RASR_ENCAPSULATE=119, // for RASR support

+ NET_FORCE_ENCAPSULATE=120, // force IP-in-IP encapsulation \

for ALL outgoing packets

};

Here we associate values for the sysctl variables we came across already ininclude/net/ip.h. We will continue this subject in Section 4.2.2.

include/linux.ip.h

@@ -62,6 +66,12 @@

#define IPOPT_SSRR (9 |IPOPT_CONTROL|IPOPT_COPY)

#define IPOPT_RA (20|IPOPT_CONTROL|IPOPT_COPY)


+#define IPOPT_SUA (14|IPOPT_CONTROL|IPOPT_COPY)

+#define IPOPT_DUA (15|IPOPT_CONTROL|IPOPT_COPY)

+#define DUA_RRV 0x80

+#endif

+

#define IPVERSION 4

#define MAXTTL 255

#define IPDEFTTL 64

The SUA and DUA option types are defined here. Furthermore, we definethe only RASR option flag used in our implementation, the RRV flag. Theoption types and flags were already covered in more detail in Section 3.3.


4.2.2 Affected code files (.c)

In this section we go through the RASR modifications in the actual kernelcode, one file at a time. For each file, the functions important to the RASRfunctionality are listed and their purpose explained. Some minor helper func-tions may be omitted in favor of clarity.

net/ipv4/sysctl net ipv4.c

This file brings together the sysctl control variable definitions from the ker-nel header files (see Section 4.2.1) and also binds them to the /proc/sys/pseudo file system. Looking at the file listing of /proc/sys/net/ipv4/

in Figure 4.1, we notice the files force_encapsulate, rasr_enabled andrasr_encapsulate. They are pseudo files, bound to the corresponding sysctlvariables inside the kernel. The values of those variables can be changed sim-ply by editing the pseudo files.

conf ip_nonlocal_bind tcp_mem

force_encapsulate ip_no_pmtu_disc tcp_moderate_rcvbuf

icmp_echo_ignore_all neigh tcp_mtu_probing

icmp_echo_ignore_broadcasts netfilter tcp_no_metrics_save

icmp_errors_use_inbound_ifaddr rasr_enabled tcp_orphan_retries

icmp_ignore_bogus_error_responses rasr_encapsulate tcp_reordering

icmp_ratelimit route tcp_retrans_collapse

icmp_ratemask tcp_abc tcp_retries1

igmp_max_memberships tcp_abort_on_overflow tcp_retries2

igmp_max_msf tcp_adv_win_scale tcp_rfc1337

inet_peer_gc_maxtime tcp_app_win tcp_rmem

inet_peer_gc_mintime tcp_base_mss tcp_sack

inet_peer_maxttl tcp_congestion_control tcp_slow_start_after_idle

inet_peer_minttl tcp_dsack tcp_stdurg

inet_peer_threshold tcp_ecn tcp_synack_retries

ip_conntrack_max tcp_fack tcp_syncookies

ip_default_ttl tcp_fin_timeout tcp_syn_retries

ip_dynaddr tcp_frto tcp_timestamps

ip_forward tcp_keepalive_intvl tcp_tso_win_divisor

ipfrag_high_thresh tcp_keepalive_probes tcp_tw_recycle

ipfrag_low_thresh tcp_keepalive_time tcp_tw_reuse

ipfrag_max_dist tcp_low_latency tcp_window_scaling

ipfrag_secret_interval tcp_max_orphans tcp_wmem

ipfrag_time tcp_max_syn_backlog tcp_workaround_signed_windows

ip_local_port_range tcp_max_tw_buckets

Figure 4.1: The file listing of /proc/sys/net/ipv4/ from a RASR enabledsystem.

The three RASR related sysctl variables are used as boolean variables (i.e., 0:false, >0: true); force encapsulate can be used to force every IP packet to be


encapsulated7, rasr enabled to control whether to use the RASR functionalityor handle the packets traditionally and rasr encapsulate to control whetherthe RASR packets should be encapsulated.

net/ipv4/ip output.c

This is the file where most of the sender side functionality of the IP layerresides. The functions interesting to us (i.e., containing RASR modifica-tions) are ip build and send pkt, ip queue xmit, ip push pending frames andip send reply. Their function is essentially identical, to prepare data comingfrom different transport layer protocols into IP packets. They differ in whatthey actually do to the incoming data, depending on how much work hasalready been done on the transport layer. For more information about thesefunctions, take a look at Section 21.1. in [5].

The RASR modifications within this file mainly relate to packet encapsula-tion8 and one special scenario. In this scenario, the sender is the root RGclosest to the destination. So, the RRA in this case actually is the currenthost’s address, meaning one step can be “hopped over”. This is exactly whatthe modifications achieve, with the help of the hop over root rg function,defined in the ip_options.c file which we cover next.

net/ipv4/ip options.c

This file contains most of the RASR functionality. Let us introduce some ofthe functions in this file relevant to RASR:

• ip options build alloc extra mem An exported function, specific to RASR.This function is only needed on special occasions, when the RASR op-tions are extended while being stored within the ip_options structure.In this type of situation the sk_buff structure has to be extended aswell before the options can be stored there.

• ip options echo An exported function, modified to support RASR. Ithandles the copying of options for the reply packets. ICMP and TCP,for example, utilize this function.

7This is only useful for testing purposes.8Using encapsulation was one of our ideas on how to deal with the problems related to

packets containing IP options. For further details, see Section 4.5.


• hop over root rg An exported function, RASR specific. Implements the“hopping over” of the first step in DUA, when applicable.

• ip options compile An exported function, modified to support RASR.This function parses the IP options from incoming packets and fills theip_options structure.

• ip options update sua and ip options update sua Internal RASR helperfunctions. Updates the SUA and DUA options, respectively.

• ip options rcv rasr An exported function. The main RASR handler.Updates SUA and DUA options and source and destination addressesas needed.

• getrootaddr and getdaddr Exported RASR helper functions. The for-mer parses and returns the RRA from the given ip_options structure,the latter the next step from DUA.

• checksum recal An exported RASR helper function. Recalculates check-sums9.

net/ipv4/ip input.c

This file contains the main IP layer input handler functions. There is afunction called ip rcv options defined in this file, whose task it is to checkfor IP options in incoming packets and act accordingly. All that we neededto do was to add a check for RASR options and if they were present, passthe control to the RASR handlers in ip_options.c.

net/ipv4/icmp.c and net/ipv4/tcp ipv4.c

These files contain code related to the ICMP and TCP protocols, respectively.In common to both of these protocols is the notion of a reply packet. Onecan, of course, send replies using any transport protocol, but those repliesare handled in user space, not within the networking stack.

When it comes to reply packets, the normal practice is to use the sourceaddress from the original packet as the destination for the reply. This workseven with RASR in normal circumstances. In our implementation, however,we have replaced the source address with the Source UA. In the case of TCP

9Changing the source and/or the destination address affects some checksums (e.g., TCPchecksum), which have to be recalculated before the packet can be passed on.


replies this is most likely not necessary, but it does no harm and the effect ithas on the performance should be negligible10.

4.2.3 Using the Kernel Configuration System (Kcon-fig)

The complete Linux kernel is a huge entity, containing a lot more than anysingle system needs to function. This is where the Kernel configuration sys-tem comes into play, controlling which parts of the kernel source are includedin the compilation process.

In 2.6 kernels, the base of the configuration system consists of Kconfig files,present in every kernel source directory. These files define configuration vari-ables and related descriptions. make menuconfig and other kernel config-uration methods read these files and allow the user a fairly simple way ofchoosing what to include into the kernel. Within the actual kernel sourcefiles, ifdef blocks are used to achieve the desired configuration.

More information about the kernel configuration options should be foundeasily enough from any book dealing with kernel development. One exampleis the book “Linux Kernel Development” by R. Love [30].

4.3 RASR Support for Applications

4.3.1 Wireshark

Wireshark11, originally known as Ethereal12, is a packet sniffer with fairlygood packet analyzing capabilities. For protocol development, such an appli-cation is particularly important, as developers need a way to check whethertheir test packets are constructed correctly.

We needed to modify Wireshark slightly, so that it would recognize RASRoptions (i.e. SUA and DUA) and display them correctly. The relevantfile in the Wireshark version 0.99.2 (i.e., the version we used) sources isepan/dissectors/packet-ip.c, which handles IP packet disassembly. Herewe basically imitated the existing IP option handlers to achieve support for

10Our first priority has been to get the implementation working – optimization is forlater.

11http://www.wireshark.org/12http://www.ethereal.com/

http://www.wireshark.org/

http://www.ethereal.com/


RASR. A screenshot from the modified Wireshark application in action isincluded in Figure 4.2.

The figure shows the parts of an ICMP Echo request (i.e., ping) packet rele-vant to RASR. The SUA and DUA options are unfolded for easy examination.The original packet is shown in the lower part of the window as Hex code.It is easy to see the advantage in this kind of an application, which decodesthe required information for you automatically.

Figure 4.2: A screenshot from Wireshark with RASR support enabled.

4.3.2 Ping

Ping is one of the most commonly used networking utilities, and one of thesimplest as well. The original version of ping was programmed by M. Muuss


in 198313. Since then the utility has been evolved quite a bit. Still, we decidedto build our RASR enabled ping version on top of the original code14.

One reason for choosing the original ping was that the more recent ping ver-sion are usually bundled with some other networking utilities, thus makingit harder to find the proper sources. Another reason, maybe even more com-pelling than the first one, is the fact that the recent ping versions include aa lot more functionality than the original, making their source files necessar-ily bigger and more complicated. For us the most important thing was toget basic ping functionality (i.e., sending ICMP Echo queries and receivingthe replies) working, so that we could test the basic network connectivitybetween two RASR hosts. The original ping is more than enough to fill thatpurpose.

It has to be noted, though, that the original ping source code is over twentyyears old and does not compile in a modern system without some modifica-tions. The patch attached as Appendix E includes these modifications.

As for the RASR related modifications. These include parsing the Desti-nation UA from the user input, initiating the SUA and DUA options, andpassing them to the networking stack using the setsockopt socket API call.We also introduced the possibility to control the time-to-live (TTL) value ofthe packets being sent. This way the route the packet takes can be betterexamined. Another popular networking utility, traceroute, bases on the sameidea.

4.3.3 GNU Netcat

Netcat is a general purpose TCP/UDP communications utility. It is usedto open a communications channel between two hosts. What goes onto thatchannel, however, is not specified by netcat. Basically any application layerprotocol (e.g., HTTP) could be implemented on top of netcat. In fact, netcatcan be used as a simple HTTP client much the same way as telnet15. TheGNU Netcat16 is a rewrite of the original Netcat 1.1017. We used the GNUNetcat version 0.7.1, which we will be referring to as simply netcat from thison, as our starting point.

13For a short history of ping, take a look at http://ftp.arl.mil/∼mike/ping.html14The original ping source code is available through http://directory.fsf.org/ping.html15Telnet is an ASCII terminal utility. It was commonly used for remote connections

before SSH.16http://netcat.sourceforge.net/17http://www.vulnwatch.org/netcat/

http://ftp.arl.mil/~mike/ping.html

http://directory.fsf.org/ping.html

http://netcat.sourceforge.net/

http://www.vulnwatch.org/netcat/


The modifications needed to add RASR support for netcat are fundamentallythe same as with ping. That is, parsing the Destination UA from the userinput, initiating the SUA and DUA options, and passing them to the net-working stack using setsockopt. Of course, netcat is a lot more complicatedand we wanted to handle both TCP and UDP communications.

TCP is a connection-oriented protocol, meaning that a stateful connectionis formed between the two communicating hosts. This requires support fromthe networking stack, as information about the connection has to be storedsomewhere. Lucky for us, this information includes the IP options from theincoming packets. The options are echoed on the reply packets, so what weneeded was just to modify this echo function in the kernel to do the necessaryswapping of SUA/DUA options. No user space modifications are necessaryon the receiving end (i.e., on the “server side”).

UDP, on the other hand, is a connectionless protocol. Thus, to emulate two-way communications, as netcat does, the necessary RASR option handlinghas to be done by the application itself.

4.3.4 OpenSSH

OpenSSH18 is an open source version of the popular Secure Shell (SSH) com-munications utility. SSH provides encrypted communications between twohosts. This is the first and so far the only “real”19 application for which wehave added RASR support.

On transport layer, SSH uses TCP. Thus, no user space modifications arenecessary on the server side, as the replies are handled by the networkingstack. The client side requires the same changes as the other applicationswe have covered this far: parsing the Destination UA from the user input,initiating the SUA and DUA options, and passing them to the networkingstack using setsockopt.

4.4 Testing and debugging

During a development phase of any software component, it is necessary to beable to test the implementation against possible use cases. For telecommuni-cations software, and especially networking protocols, this means having the

18http://www.openssh.org/19Real in the sense that its main uses are not in development and testing.

http://www.openssh.org/


ability to test against different network topologies.

The fact that our implementation resides in the kernel space poses anotherchallenge. Time is always somewhat of an issue, and especially in the de-velopment phase it is important that testing can be done frequently andfairly quickly. However, kernel space modifications generally require the re-compilation of the whole kernel and a reboot to get the new kernel active.Furthermore, a bug in the kernel code may easily hang the whole system,making debugging a tedious exercise.

Writing the code as a kernel module helps with some of the issues. However,built-in functionality has better access to the kernel data structures andfunctions and besides, it was easier to modify the existing IPv4 code20 thanto write a new module from scratch.

For us, the help came from a Linux virtualization method called User ModeLinux21 (UML22). The author of UML is J. Dike, who continues to developit further, and has also written a nice book about the subject [16]. UMLhas also been included into the official Linux ’vanilla’ kernel, making it easilyavailable for kernel developers. This was one of the reasons we decided toadopt UML as our virtualization tool of choice.

UML is a Linux-specific virtualization method. Fundamentally, it allows theLinux kernel to be compiled as a user-space executable (i.e., allowing Linuxto be run on top of Linux). The resulting program can be run like any normalapplication, and even debugged like one (see Appendix C for more details).The plain UML kernel, however, is not enough by itself – it needs a bundleof user space programs and files with which to form a running Linux system.

This collection of user space programs and files is called a distribution. De-bian, Ubuntu, Fedora and Slackware, among many others, are Linux distri-butions. Often they ship with a version of the kernel slightly different fromthe official ’vanilla’ kernel. With UML the easiest way to achieve a work-ing system is to download a file system image with a pre-installed Linuxdistribution and use that with the UML kernel23.

The advantages UML offers include the already mentioned fact that it canbe debugged like any normal application. Related to this, a bug in the

20As far as we can tell, the base IPv4 code in the kernel is built-in. IPv6 code, on theother hand, can be built as a module.

21http://user-mode-linux.sourceforge.net/22Not to be confused with the Unified Modeling Language. We use the acronym UML

solely to refer to User Mode Linux.23A good collection of UML file system images and precompiled UML kernels is available

at http://uml.nagafix.co.uk/.

http://user-mode-linux.sourceforge.net/

http://uml.nagafix.co.uk/


UML kernel can hang the UML instance24 in question, but does not affectthe system as a whole. The hanged UML instance can be killed, the bugfixed and the UML kernel recompiled ready for further testing, all withoutrebooting the host machine (i.e., the machine on which the UML kernel isrun).

Furthermore, several UML instances can be started simultaneously, virtuallyas many as the resources on the host machine allow. This, with the fact thatUML allows virtual networking between UML instances, and even with theoutside world through the host machine, is perfect for our development phasetesting needs. Figure 4.3 shows a couple of test scenarios we frequently usedduring the development phase.

Figure 4.3: A couple of RASR test scenarios involving virtual UML hosts.

24Running the UML kernel executable creates a virtual machine, which we will refer toas a UML instance.


4.5 Limitations

4.5.1 Limitations of RASR in General

The Usage of IPv4 Options

RASR heavily relies on the usage of IPv4 Options, when traversing IPv4networks. And as the RASR infrastructure defines the root realm to beIPv4, there is no way of escaping their use.

One limiting factor related to IPv4 Options is their length. The whole IPv4datagram is limited in length by the size of its Length field, 4 bits, whichallows for 15 units for the whole datagram. A unit here is a 32 bit word (i.e.,4 bytes). The constant-length part of the IP header is 5 units (20 bytes), soonly 10 units (40 bytes) is left for the options. [36]

The available length should be enough for most cases, though. SUA andDUA options reserve 3 and 4 bytes, respectively, for their own “headers”.This leaves about 8 units (32 bytes) for the actual address information, whichhas to be divided for both SUA and DUA. So, in the worst case scenario,the maximum length for an UA is 4 units (16 bytes). This is more thanenough if the addresses within the UAs are fairly short, like IPv4. With longaddresses, like IPv6, we have a slight problem, as an IPv6 address with its128 bits (16 bytes) would fill up the whole worst-case UA.

Even the IPv6 scenario is manageable, with certain assumptions. First ofall, we assume that there is only one IPv6 realm. Furthermore, there shallbe no realms below the IPv6 realm. Both of these assumptions make sense,because of the vastness of the IPv6 address space – there simply is no needfor several IPv6 realms or for any other realms below the IPv6 realm. Now,connections within the IPv6 realm are possible without RASR so no problemthere. As for the connections with hosts outside the IPv6 realm, either theSUA or the DUA would contain an IPv6 address, but never both of them atthe same time. Thus, more space can be reserved for whichever UA containsthe IPv6 address. Of course, the other UA would be shortened in turn, butshould still allow at least a couple of IPv4 addresses, for example.

The length issue may become a problem for some other realms, with evenlonger addresses than IPv6, should such realms exist.

Another issue concerning the use of the IPv4 Options, more pressing thanthe length issue, is the way packets containing IPv4 options are handled bythe routers and firewalls in the Internet. From RFC 1812 “Requirements forIP Version 4 Routers” [3]:


4.2.2.6 Unrecognized Header Options: RFC 791 Section 3.1

A router MUST ignore IP options which it does not recognize.

This clearly states that unknown IP options should simply be ignored (i.e.,passed on without any change). In reality, however, many routers and fire-walls consider packets with IPv4 Options as second-class citizens, directingthem on a so called slow path25 or even dropping them altogether [31].

An article by P. Fransson and A. Jonsson [23] contains some measurementson the delay, jitter and most importantly the loss rate of packets with IPv4Options vs. without them. The results support the belief that the usage ofIP options reduces the probability of successful communications. Our ownexperiences suggest the same – we had great difficulties in establishing a testconnection over the Internet using RASR, as some router or firewall in theuniversity network blocked all of our test packets. We did find a suitableconnection in the end, though (see Section 5.1.2).

Our belief is that most of the trouble related to IPv4 Options can be tracedto the border ISP26 networks (i.e., to ISP routers, gateways and firewalls).Once within the Internet backbone27, even packets with unknown IP optionsmay travel freely. This is a bold assumption, and we could not find anyresearch results to support it. The problem with existing research is thatit is done end-to-end, between hosts that are always some steps away fromthe actual backbone. Ideal situation would be to be able to directly connectto backbone links for experimentation. This, however, would require co-operation with the backbone operators.

4.5.2 Limitations of Our RASR Implementation

Support for IPv4 Only

The current RASR implementation supports only one network layer protocol,IPv4. The normal practice of software development is to first get the basicfunctionality up an running and then start adding features (e.g., support for

25Slow path processing is handled by a software based router engine. The alternative,fast path processing, is hardware based and relies on cached routing information. Ingeneral, incoming packets are directed to the slow path if a suitable cache entry cannot befound. [43]

26A border ISP provides Internet access for consumers.27We use the term Internet backbone to refer to the large capacity networks connecting

border ISPs together.


new protocols). IPv4 was the natural choice to start with, as it is by far themost common network layer protocol in the current Internet architecture.Besides, RASR requires IPv4 support by definition in the root realm.

Adding IPv6 support would have been the next logical step. Because of timeconstraints this was left out from the current implementation.

Reliance on RAW Sockets

RAW sockets are very useful in protocol development, as they allow theapplication to completely control the packet creation process. Thus, we didnot need to modify any existing APIs to get RASR packets in the network.

The downside is that RAW sockets require root access. It is usually nota problem in the development phase, but unacceptable for production use.Root access should be given only when absolutely necessary, as it is alwaysa security risk. And we are talking about a situation where it would have tobe given frequently to all applications with RASR support.

Another problem is in providing the necessary logic in the application. If theprotocol specification happened to change, every application would need tobe updated separately.

Enabling NAT on Any RG Disturbs RASR Functionality

NAT and RASR do not work well together. NAT works by modifying thesource and destination address of a packet. This may disturb the RASRfunctionality, and besides, RASR packets do not need NAT. What is needed,then, is for the NAT functionality on an RG to recognize RASR packets andleave them untouched. The current implementation does not work that way,so NAT has to be manually switched off during RASR communications.

TCP Connections Fail in Certain Scenarios

In scenarios where either the sender or the receiver is a root RG (i.e., sothat the “hop over root RG”occurs), connection fails due to malformed TCPpackets. In spite of our debugging efforts, we have not been able to get ridof the problem. Luckily, the scenarios which are affected are not common.


RASR Packet Encapsulation Support Incomplete

We considered encapsulation as a solution to the problems related to theusage of IPv4 Options. More specifically, we considered IP-in-IP encapsula-tion [33], in which the packet is encapsulated with an extra IP header. Thisoutermost header would not contain any IP Options, thus masking the prob-lem. Unfortunately, it did not work – the packets were still being dropped.This is why the encapsulation support was left incomplete.

So, it seems that IP-in-IP encapsulated packets are not handled much betterthan packets containing IP Options. However, we think the reasons areslightly different. A router should only be concerned about the outermostIP header. Of course there may be a rule dropping all IP-in-IP encapsulatedpackets, but what would be the purpose? Firewalls, on the other hand, maywell choose to drop IP-in-IP encapsulated packets due to their masking effect.Or, it may choose to check the inner packet, and drop it because of the IPoptions. The end effect is the same. So, while packets with IP options arebeing dropped on both routers and firewalls, IP-in-IP encapsulated packetsshould only be dropped by firewalls. We have no way of proving this, though.

Another possibility would be to use transport layer encapsulation. In fact,many protocols use this method of passing through firewalls, by encapsulatingtheir packets inside UDP payload. UDP is perfect for such use, as it doesnot contain anything extra – it is a simple connectionless protocol. However,using transport layer encapsulation causes processing overhead, and kind ofdefeats the purpose of RASR, so it is in not a desired solution in any way.Still, it may be the only possibility of allowing RASR packets to traversecertain networks. More research on the subject is required.

4.6 Optimization

We have not put much thought in the performance of our implementation.Time for that is later. Still, we will present here some possible optimizations.

In the basic RASR functionality, every packet goes through the root realm.This happens even if the sender and receiver reside in the same realm. So,a logical optimization would be to always use the shortest path possible. Away to do this would be to compare the UAs of the sender and receiver.Figure 4.4 illustrates the issue. The common part of the UAs tells where ashortcut is possible.

The problem is that the sender may not know its own UA, and neither do


Figure 4.4: The Universal Address (UA) construction illustrated.

the RGs until the root realm is reached. Thus, no comparison is possible.However, as a host receives a packet it learns its own UA from the DUA.So, it may be necessary to send the first packet through the root realm butafter that the route can be optimized. In cases where both the sender andreceiver reside on the same realm, this is easy enough to implement, but forroute optimizations done by the RGs the issue is more complicated.

As an RG does not normally act as an endpoint for RASR communications, ithas no way of knowing its own UA. It could store information about packetspassing through it, waiting for the reply from which it could parse its ownUA, but this would make it more like a NAT box, and require resources wewould rather not waste on that.

Another possibility would be for the RG to do an “UA query” for time totime, asking for the UA of the next RG closer to the root realm. Then itwould simply add its own address to that UA to form its own UA. If the RGabove did not know its own UA either, it would query the next RG closer tothe root realm. This would continue recursively, until a root RG receivingthe query would simply return its public IPv4 address which is its UA. Afterfinding out its own UA, an RG would cache it so that new queries are not


required all the time.

Other, smaller optimizations would most likely deal with the implementationcode itself. As an example, let us consider the code that actually builds theSUA on RGs as the packet travels towards the root realm. The packet,while inside the networking stack, is stored within a kernel data structurecalled sk_buff (see Appendix D for the definition). This data structure hasallocated a certain space from the memory for the packet. However, theprocess of building SUA requires extra memory, as a new address is beingadded to it. In other words, more memory has to be allocated for the packet.There are kernel functions that help with this, but in any case it is a fairlyextensive operation, one where there undoubtedly is room for improvement.

Originally we did this a bit differently, by reserving space for the whole SUAright from the start. This was achieved by adding empty NOOP optionsbetween the SUA and DUA options. Then, as the SUA grew, we simplyoverride the NOOPs. The advantage of this approach is that no memoryreallocation is required. On the other hand, in this approach the maximumlength for both SUA and DUA is predefined. As in most scenarios the wholeSUA and DUA are of different lengths, and there are not that much extraspace to spare (see Section 4.5.1 for more details), a more dynamical approachis needed.

Chapter 5

Evaluation

In this chapter we aim to evaluate the RASR proposal further, by testingour RASR implementation against the current best practice solution (i.e.specific routing rules, NAT and port mapping). The chapter consists of twoparts: first, the testbeds used in the evaluation are presented in Section 5.1,followed by the presentation of the actual test cases and the test results inSection 5.2. The analysis of the results are left to Chapter 6.

5.1 Testbeds

This section describes the testbeds used to gather performance metrics aboutthe proof-of-concept RASR implementation, which was presented in the pre-vious chapter, Chapter 4. For information about development phase testing,see Section 4.4.

Both testbeds include virtual UML hosts as parts of the network. Their usagemay show in the results as a delaying factor. However, it should affect eachtest the same way, no matter whether RASR is used or not, thus keeping theresults comparable.

5.1.1 Testbed 1: Private Testbed, RASR on Embed-ded Devices

Our private testbed is shown in Figure 5.1. An embedded device with RASRsupport, a WLAN router, is included in the testbed, acting as an RG. Thus,we have quite a heterogeneous scenario, consisting of different types of de-

54

CHAPTER 5. EVALUATION 55

vices.

There is even the possibility for wireless communications, but we did ourtests utilizing only the wired router functionality of the WLAN router. Themain reason for this were some issues we had with SSH communications overthe wireless link. In any case, wired communication is more reliable, andafter all, RASR functions on the network layer (i.e., the IP layer) and doesnot care about the underlying physical layer (i.e., the distinction betweenwired and wireless communications).

Figure 5.1: Testbed 1 – A private RASR testbed.

Routing tables for each node are given in Appendix F. Relevant firewallrules are listed in Appendix G. NAT was not used, as specific routing rulesprovided necessary connections to do the reference tests (i.e., tests withoutRASR).

5.1.2 Testbed 2: A Connection Over the Internet

Our “public” testbed is shown in Figure 5.2.

Routing tables for each node, except for the router/gateway nodes in theroot realm, are given in Appendix F. Relevant firewall and NAT rules arelisted in Appendix G.


Figure 5.2: Testbed 2 – A connection over the Internet.

5.2 RASR Implementation Performance Met-

rics

This section introduces the test cases, as well as the test results. We use twoapplications to perform the tests, ping and GNU Netcat, both modified tosupport RASR. We have discussed the modifications in Sections 4.3.2 and 4.3.3respectively for ping and netcat. In addition to the test cases utilizing RASR,these modified applications are also used in the reference test cases (i.e., testcases utilizing specific routing rules and NAT), to keep the results compa-rable. This is possible, as the modifications do not prevent the usage oftraditional addresses (i.e., IPv4 addresses).

We aim to provide reliable results by repeating the tests when applicable. Ithas to be noted, though, that some variance in the results is always expected,as the IP network is dynamic in nature.


5.2.1 Reachability Using Ping

The test cases are listed in Table 5.1. Test cases A through D consist of 100pings each, and each of them is performed 10 times. Test case E consists ofjust 10 ’normal’ pings (i.e., without the RASR options) and 10 RASR pingsagainst each remote host. The ping method used is ICMP Echo with 56bytes of data.

Test Case A Private testbed (Figure 5.1) without RASR.Source: UM1, destination: UM2.

Test Case B Private testbed (Figure 5.1) with RASR.Source: UM1, destination: UM2.

Test Case C Connection over the Internet (Figure 5.2) withoutRASR.Source: UM1, destination: Root UM2.

Test Case D Connection over the Internet (Figure 5.2) with RASR.Source: UM1, destination: Root UM2.

Test Case E Global reachability. Network layout as in Figure 5.1.Source: Root RGDestination: 50 web servers around the world

Table 5.1: The ping test cases.

Test case E is special, as it involves communicating with hosts that do notsupport RASR. However, we have noticed that even when a host does notrecognize the RASR options, it may still mirror them in the reply packets(e.g., ICMP replies).

Even though the mirrored options do not contain the correct information(i.e., SUA and DUA options, which should have been swapped, have not)we at least know that the packets have reached their destination and shouldthat destination be upgraded to support RASR, we would have a workingRASR connection.

The results are presented in Tables 5.2 and 5.3. The first table, Table 5.2,shows the minimum, maximum and average round-trip times (RTTs)1 fortest cases A through D, as reported by the ping utility. The last column,

1Round-trip time (RTT) measures the time it takes for a packet to first reach itsdestination and then for the reply to reach the original sender. In other words, it measuresboth the travel time due to the distance and the delays due to processing done in variousnetwork elements on route between the sender and the recipient. It does not, however,provide any insights as to which network element caused the biggest delay, for example.


standard deviation, tells how much the RTTs varied.

min max average standard deviationTest Case A 0.759 5.058 1.235 0.409Test Case B 1.756 6.510 2.429 0.479Test Case C 4.158 18.428 4.785 0.880Test Case D 9.380 173.002 22.605 25.574

Table 5.2: The ping test results. Values are RTTs in milliseconds (ms).

The second result table, Table 5.3, contains selected results for our globalreachability test (test case E). Full results can be found in Appendix H. Forthis test, we selected 50 university web servers around the world, based on theAcademic Ranking of World Universities 2007 2. Our selection is admittedlysmall, and as such the results from this test cannot be used to make anydefinite conclusions.

For brevity, only servers that not only responded to our“RASR ping”packets(i.e., ICMP queries including RASR options), but also mirrored the RASRoptions in reply packets, are listed in Table 5.3.

server normal ping RASR pingTech Univ Helsinki (Finland)www.tkk.fi (130.233.240.9) 2.447 12.316Univ Athens (Greece)www.uoa.gr (195.134.100.100) 84.358 132.707Univ Oslo (Norway)www.uio.no (129.240.4.44) 15.402 29.969Jagiellonian Univ (Poland)www.uj.edu.pl (149.156.89.139) 78.633 94.282Univ Cambridge (UK)www.cam.ac.uk (131.111.8.46) 47.882 110.418Univ Edinburgh (UK)www.ed.ac.uk (129.215.13.199) 51.663 93.112

Table 5.3: The ping test results for test case E. Values are average RTTs inmilliseconds (ms).

2http://www.arwu.org/rank/2007/ranking2007.htm

http://www.arwu.org/rank/2007/ranking2007.htm


5.2.2 File Transfer Using Netcat

The test cases used in this test are introduced in Table 5.4. We used netcat,a general purpose TCP/UDP communications utility, as our test applicationin TCP mode. File transfer with netcat is possible simply by piping the inputfrom a file instead of stdin3 on the sending end and piping the output to afile instead of stdout4 on the receiving end.

Test Case A Private testbed (Figure 5.1) without RASR.Source: UM1, destination: UM2.

Test Case B Private testbed (Figure 5.1) with RASR.Source: UM1, destination: UM2.

Test Case C Connection over the Internet (Figure 5.2) withoutRASR.Source: UM1, destination: UM2.

Test Case D Connection over the Internet (Figure 5.2) with RASR.Source: UM1, destination: UM2.

Table 5.4: The Netcat file transfer test cases.

The idea was to use random content files with differing file sizes as the testdata. Unfortunately, as we tried to move the files as fast as the applica-tion allows (i.e., in big chunks), the RASR implementation completely froze,making this type of test impossible to perform.

Instead, we decided to use a single test file containing 103,696 bytes of randomnumbers and letters, divided into 80 char lines. Each line is separated withthe line feed (LF) character5. The following is a sample line from the testfile.

jpH2h9TFgrSg3KjWRXKm3X2VuP92ONVieZ77DnFFLH5g8UBotTNimBrmDQPWeH4ZJWfd4HBgYgUr5Lh

The file was transferred one line at a time, keeping the data part of eachpacket at 80 bytes. Still, transferring the test packet with RASR enabled didnot go problem-free if the packets were sent too frequently. Due to this weused the ’-i’ option of netcat to introduce a delay between each sent packet.

3stdin is the standard input stream (i.e., where an application gets its input), usuallythe keyboard.

4stdout is the standard output stream (i.e., where an application puts its output),usually the screen.

5ASCII code for the LF character is 10. Unix/Linux systems commonly use this char-acter to separate lines in text files.


However, the delay was defined in seconds which would have led to extremelyslow file transfer. What we did was to modify the application code, resultingin the delay being defined in microseconds (usec).

We measured the transfer times using the GNU time6 utility with defaultoptions. The measuring results are presented in Table 5.5.

It should be noted that the measured times are from single tests and assuch provide only limited value. Furthermore, breaking the file into suchsmall packets and introducing the delay between each packet makes the testa poor estimate on file transfer performance.

For test case B (i.e., private testbed with RASR) we could not get any results,as the file transfer froze after exactly 2,048 bytes (i.e., about 50-60 packetsincluding the acknowledgements, ACKs) regardless of the delay. And forall cases, introducing the delay led to an incomplete data transfer (i.e., acouple of lines from the end of the file were not transmitted). However, thishappened regardless of whether RASR was used or not, so we consider it anissue with the test application, not the RASR implementation itself.

We will analyze the results, and the file transfer issue in general, in the nextchapter, Chapter 6.

no limit 20 usec 2000 usec 200,000 usecTest Case A 10.119 43.058 42.999 306.263Test Case B - - - -Test Case C 10.519 54.039 43.887 306.027Test Case D - 41.416 46.203 295.758

Table 5.5: The file transfer test results. Values are transfer times in seconds(s).

6http://www.gnu.org/software/time/

http://www.gnu.org/software/time/

Chapter 6

Discussion

In this chapter we discuss further the solutions introduced throughout thethesis. The focus is on RASR, which was presented in Chapters 3 and 4. Thetest results from Chapter 5 form the basis for our discussion.

We begin by analyzing the test results in Section 6.1. Based on the analysisand theoretical information about RASR and other solutions, we perform acomparison of solutions in Section 6.2.

We finish the discussion in Section 6.3 with one fundamental question: Is thefuture Internet going to be based on the current architecture (i.e., utilizingone of the solutions presented here, or something close to them at least) oron a completely new design?

6.1 RASR Implementation Performance Anal-

ysis

6.1.1 Reachability (Ping)

We performed ping testing in three different environments:

1. On a closed testbed, endpoints RASR-aware (Testbed 1.).

2. Over the Internet, endpoints RASR-aware (Testbed 2.).

3. Over the Internet, remote hosts having no RASR support.

For each environment we defined two test cases: the actual RASR function-ality testing and a reference test case without RASR. Measured variable was

61

CHAPTER 6. DISCUSSION 62

round-trip time (RTT), which gives as an idea of the network latency for eachtest case. Based on the measurements, we calculated the minimum, maxi-mum and average RTTs, along with the standard deviation of RTTs, for thefirst two environments. In the last environment we did not do such extensivetesting, as we used public hosts, university web servers, as the remote hosts.So, for that environment, we only calculated the average RTT. The resultsare listed in Tables 5.2 and 5.3 and in Appendix H.

The minimum and maximum values show the limits between which the RTTsvary. Due to the highly dynamic nature of IP networking, the importanceof these values is lessened – a single disruption on route shows in the limits,even though its effect might not be long-lasting. Thus, average and standarddeviation are more interesting values to us, as they are not as greatly affectedby abrupt changes in the data – they better align with the bigger picture.

As for the results themselves, test cases involving the closed testbed are notthat interesting. Whether RASR is used or not seems not to matter much.The average RTT is about 1.2 ms greater when RASR is used, apparentlydue to the processing overhead caused by the RASR functionality, but thatis to be expected. Standard deviation is slightly higher for the RASR caseas well, but not enough to draw any conclusions. For comparison, Figure 6.1shows one test run from both test cases side by side. The higher averageRTT for RASR ping is evident, and there might be slightly more variance inthe profile of RASR ping RTTs hinting towards the slightly higher standarddeviation.

Things get a lot more exciting when we involve a realistic network environ-ment, the Internet. The minimum RTT is about 5 ms higher for the RASRcase, setting a reference value for the processing overhead. The maximumRTT, on the other hand, is about 150 ms higher for the RASR case, abouttenfold when compared to the reference case. Of course this could just bedue to passing disturbances on the network connections, and in fact the dif-ference in average RTTs is a lot more acceptable, around 18 ms. The mosttelling value, however, is the standard deviation, for the RASR case 25 ms,compared to the 1 ms for the reference case. This means that the RASRcase was susceptible to RTT changes on a completely different scale than thereference case. Looking at Figure 6.2, a graphical representation from onetest run, we easily come to the same conclusion.

The term used to refer to the RTT variations is jitter. In general, the moreand bigger the variations, the bigger the jitter. Jitter negatively affects manyreal-time two-way network services, like Internet gaming and telephony. One-way services like streaming (i.e., sound and/or video) are also affected, but


0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

0 10 20 30 40 50 60 70 80 90 100

Rou

nd-t

rip ti

me

(RT

T)

in m

illis

econ

ds (

ms)

ICMP sequence no.

normal pingRASR ping

Figure 6.1: Normal ping vs. RASR ping on a private testbed.

the effects can be mitigated through buffering.

Based on the result table in Appendix H, in our global reachability test (i.e.,ping against remote hosts having no RASR support), 32 out of 50 respondedto our normal ping packets, forming the reference group. Out of those 32, 14responded to our RASR ping packets (i.e., packets containing RASR options).And finally, six of those 14 mirrored the RASR options in their reply packets.

As an example, we display our ping test results against two remote hoststhat did mirror the RASR options and as such would be perfect candidatesfor RASR hosts. Figure 6.3 illustrates the test against our university webserver, and Figure 6.4 against the University of Athens web server.

From the figures, the same type of increase in jitter for the RASR case canbe seen as we have witnessed already. Of course these tests only consisted often ping packets per test, which is really not enough to give unbiased results.Still, the trend is clear.


0

20

40

60

80

100

120

140

160

0 10 20 30 40 50 60 70 80 90 100

Rou

nd-t

rip ti

me

(RT

T)

in m

illis

econ

ds (

ms)

ICMP sequence no.


Figure 6.2: Normal ping vs. RASR ping over the Internet.

6.1.2 File Transfer (Netcat)

File transfer with RASR seems to suffer if the packets are being sent too fre-quently. For connections over the Internet, the limit appears to be somewherebetween 150 and 200 ms. With 150 ms, retransmission packets start appear-ing on the network, as the example given in Table 6.1 shows, suggesting amissing packet somewhere.

The example was collected on the receiving end of the connection. We noticeboth TCP Retransmission and TCP Dup ACK packets, indicating that onthis end everything was OK – the original packet (e.g., no. 41) was receivedwithout problems and an ACK sent back (e.g., no. 42). However, this ACKapparently never reached the original sender, as a TCP Retransmission issent (e.g., 43), received and a new ACK sent back (e.g., 44 – called TCPDup ACK for the obvious reason that it is a duplicate as far as the hostunder observation is concerned).

However, the existence of retransmissions does not by itself mean that theconnection will fail. It simply indicates a degradation of service. TCP willmake sure that the data is successfully sent in the end. So, dropping the


0

5

10

15

20

25

30

35

40

45

50

0 1 2 3 4 5 6 7 8 9

Rou

nd-t

rip ti

me

(RT

T)

in m

illis

econ

ds (

ms)

ICMP sequence no.


Figure 6.3: Reachability test (ping) against www.tkk.fi (130.233.240.9).

delay between packets even far below the 150 ms mentioned earlier will notnecessarily mean the connection will fail. As the results of Test Case D inTable 5.5 show, as low as 20 usec (i.e., 0.020 ms) can still provide workingcommunications. However, the smaller the delay the more frequent are themissing packets and the resulting retransmissions. It might even lead to aconnection failure on some circumstances.

We were unable to get any measurable results from file transfer on our pri-vate testbed using RASR. The transmission simply froze after 2,048 bytesregardless of the delay between packets. This, however, most likely has moreto do with our test application than the RASR implementation itself, as in-teractive connections (i.e., input from keyboard instead of a file) work justfine for a lot more than 2 kBs worth of data. Unfortunately, due to timeand resource constraints we could not further analyze the reason behind thisissue.


80

100

120

140

160

180

200

0 1 2 3 4 5 6 7 8 9

Rou

nd-t

rip ti

me

(RT

T)

in m

illis

econ

ds (

ms)

ICMP sequence no.


Figure 6.4: Reachability test (ping) against www.uoa.gr (195.134.100.100).

6.2 Solution Comparison

To sum up the features and requirements of the solutions covered in thisthesis and to aid in their analysis, we have attached a comparison table asAppendix I. The table lists following information about each solution.

• Modified entities – Network entities (e.g., host, router, gateway, NATor DNS) that would need modifications in order to implement the func-tionality of the solution in question.

• New entities – New entities that would have to be added to thenetwork.

• Identity – How is a host identified?

• Location – Where is the location information of a host stored?

• Implemented – Is there a working implementation of the solution?All aspects of the solution do not have to be implemented, as long asthe basic functionality is provided.


No. Time Source Destination Info2 1.542108 82.130.34.117 192.168.2.2 50620 > rrac [SYN] Seq=0 Len=0 ...3 1.542416 192.168.2.2 82.130.34.117 rrac > 50620 [SYN, ACK] Seq=0 Ack=1 ... Len=0 ...4 1.552518 82.130.34.117 192.168.2.2 50620 > rrac [ACK] Seq=1 Ack=1 ... Len=0 ...5 1.555836 82.130.34.117 192.168.2.2 50620 > rrac [PSH, ACK] Seq=1 Ack=1 ... Len=80 ...6 1.556060 192.168.2.2 82.130.34.117 rrac > 50620 [ACK] Seq=1 Ack=81 ... Len=0 ......41 4.533748 82.130.34.117 192.168.2.2 50620 > rrac [PSH, ACK] Seq=1361 Ack=1 ... Len=80 ...42 4.539920 192.168.2.2 82.130.34.117 rrac > 50620 [ACK] Seq=1 Ack=1441 ... Len=0 ...43 4.805682 82.130.34.117 192.168.2.2 [TCP Retransmission] \

50620 > rrac [PSH, ACK] Seq=1361 Ack=1 ... Len=80 ...44 4.805887 192.168.2.2 82.130.34.117 [TCP Dup ACK 42#1] \

rrac > 50620 [ACK] Seq=1 Ack=1441 ... Len=0 ...45 4.814925 82.130.34.117 192.168.2.2 50620 > rrac [PSH, ACK] Seq=1441 Ack=1 ... Len=80 ...46 4.815067 192.168.2.2 82.130.34.117 rrac > 50620 [ACK] Seq=1 Ack=1521 ... Len=0 ...47 5.073292 82.130.34.117 192.168.2.2 [TCP Retransmission] \

50620 > rrac [PSH, ACK] Seq=1441 Ack=1 ... Len=80 ...48 5.073463 192.168.2.2 82.130.34.117 [TCP Dup ACK 46#1] \

rrac > 50620 [ACK] Seq=1 Ack=1521 ... Len=0 ......

Table 6.1: File transfer with RASR using netcat (150 ms send delay).

• Protocol specific – Is the solution specific to a single protocol (e.g.,IPv4) or a defined group of protocols (e.g., IPv4/6)?

From the fifteen solutions listed on the table, RASR included, only MIDCOM(Section 2.3.3) and NUTSS (Section 2.4.4) do not require any changes on thehost OS. Applications are another aspect.

Five solutions define new entities to the network. On the other hand, thosesolutions in general require fewer existing entities to be modified. Result isabout the same – existing or new entities, on average two to three types ofentities are affected. The solutions with most fundamental changes, IPv6(Section 2.2) and Nimrod (Section 2.4.3) are the only ones that require vir-tually every network entity to be modified.

Identity vs. location reflects the structure of Chapter 2.

Most of the solutions we have covered do have at least a proof-of-conceptimplementation, but for five solutions we could not find any. This makesestimating the feasibility of those solutions harder.

The final column, protocol specific tells us whether the solution is designedwith only a defined set of protocols in mind or whether it can be extendedto support new protocols. The result is about 50-50. The more generalsolutions take the advantage here, as due to the everchanging technology itis not a good idea to design anything with only a specific destination in mind.Protocols change and after IPv4 is gone the IPv4 specific solutions would be


history as well. On the other hand, after IPv4 is gone, so is NAT and relatedproblems, we hope. In that scenario about every solution here would losetheir meaning.

6.3 The Future Internet – Based on the Cur-

rent Architecture or a Completely New

Design?

The fundamental view throughout this thesis has been that the current In-ternet architecture should be the basis for the future Internet. The proposalswe have presented suggest various amount of changes to the architecture,some even quite fundamental, but only changes nonetheless.

There is, however, yet another approach to consider. It is called the cleanslate approach. As the name suggests, in the clean slate approach we drop thecurrent Internet architecture altogether as obsolete, go back to the drawingboard, and design the future Internet from scratch. In other words, we startfrom a clean slate. It is a bold approach, but one worth considering. Researchis currently being done in the area. [20]

It is not possible, however, to simply click a switch and move everybody fromthe current Internet to the new one. It would be foolish to even consider thatthe current Internet could be removed. Many great ideas are lost becausethey try to take it all – think that it is possible to completely overrun anold technology with new, simply because its better. The world just does notwork that way. For example, vinyl records are still used and being made,even though digital recordings are of far superior quality.

A completely redesigned Internet might well provide the best possible futureInternet. But it will not happen, if everybody is expected to move from theold to the new. Instead, the old Internet should be left parallel to the newInternet. This new Internet, redesigned and void of the problems known toexist in the current architecture, would be opened for those tired of the oldproblems. The question is, of course, who will move?

Chapter 7

Conclusions and Future Work

With the Internet gaining more and more popularity and users, it is only amatter of time when the current architecture ceases to function properly. Theestimations vary, the current IPv4 address space is estimated to be exhaustedby 2011 [1] and at least one plan puts the large-scale transition to IPv6 onthat same time frame [14].

In the mean time, solutions that are not as radical as IPv6 are being activelyresearched and definitely have potential. Among them is RASR, the mainresearch focus of this thesis. Compared to the solutions presented in Chap-ter 2, RASR competes with generality (i.e., it is not tied to a single under-laying protocol like IPv4). Furthermore, RASR aims to lower the thresholdof adopting the solution by continuing to support legacy applications (i.e., itdoes not break existing network connectivity).

Through our tests in Chapter 5 we have proven that RASR not only worksin a closed testbed environment but over the Internet as well. There are lim-itations, however. The main problems appear to be the increase in jitter andpacket loss. Without these problems, RASR would compete in performancewith the current best practice (i.e., NAT). There are some performance over-head due to the handling of RASR options, but it is insignificant comparedto the other issues. Also, we did not aim to produce a good performanceimplementation at this point, but a working one. Optimizations are left asfuture work.

Our RASR implementation is far from ready. The following is a list of someaspects that still need work.

• The current implementation only supports IPv4. Support for othernetwork layer protocols needs to be added, for IPv6 at the very least.

69

CHAPTER 7. CONCLUSIONS AND FUTURE WORK 70

• On the sender side, modifying the existing socket APIs to either allowthe UA to be passed onto kernel space for processing or process the UAthemselves and initiate the RASR options.

• Support for DNS (e.g., for resolving UAs and handling mobility).

• Better handling of special network scenarios. For example, let us con-sider a scenario where either the sender or the receiver is a root RG.TCP connections in this kind of a scenario currently fail – see Sec-tion 4.5.2 for more details).

• Better application support.

• Optimizations. See Section 4.6 for more details.

Furthermore, a lot more practical testing and measurements are needed indifferent environments, especially over

• long-distance connections and

• different types of network elements (i.e. gateways and routers).

Bibliography

[1] IPv4 address report (potaroo.net). Auto-generated daily.Referenced on the 30th, Nov. 2007. Available online athttp://www.potaroo.net/tools/ipv4/.

[2] RIPE NCC service region Hostcount vs. Hostcount++, 2007.Available online at http://www.ripe.net/hostcount/hostcount++/2007/<06,07,08,09,10>/cmp/all-tld-hosts.html.

[3] Baker, F. Requirements for IP version 4 routers. RFC 1812,Internet Engineering Task Force, June 1995. Available online athttp://www.rfc-editor.org/rfc/rfc1812.txt.

[4] Balakrishnan, H., Lakshminarayanan, K., Ratnasamy, S.,

Shenker, S., Stoica, I., and Walfish, M. A layered naming ar-chitecture for the Internet. In SIGCOMM ’04: Proceedings of the 2004conference on Applications, technologies, architectures, and protocols forcomputer communications (New York, NY, USA, 2004), ACM Press,pp. 343–352.

[5] Benvenuti, C. Understanding Linux Network Internals. O’Reilly Me-dia, Inc., 2006.

[6] Berners-Lee, T., Fielding, R., and Masinter, L. Uni-form resource identifier (URI): generic syntax. RFC 3986, In-ternet Engineering Task Force, Jan. 2005. Available online athttp://www.rfc-editor.org/rfc/rfc3986.txt.

[7] Borella, M., Lo, J., Grabelsky, D., and Montene-

gro, G. Realm Specific IP: framework. RFC 3102, Inter-net Engineering Task Force, Oct. 2001. Available online athttp://www.rfc-editor.org/rfc/rfc3102.txt.

[8] Campbell, D. IPv6: The calm before the storm? Network World 24,5 (2007), 17.

71

http://www.potaroo.net/tools/ipv4/

http://www.ripe.net/hostcount/hostcount++/2007/06/cmp/all-tld-hosts.html





http://www.rfc-editor.org/rfc/rfc1812.txt



BIBLIOGRAPHY 72

[9] Carpenter, B., and Moore, K. Connection of IPv6 domains viaIPv4 clouds. RFC 3056, Internet Engineering Task Force, Feb. 2001.Available online at http://www.rfc-editor.org/rfc/rfc3056.txt.

[10] Castineyra, I., Chiappa, N., and Steenstrup, M. The Nimrodrouting architecture. RFC 1992, Internet Engineering Task Force, Aug.1996. Available online at http://www.rfc-editor.org/rfc/rfc1992.txt.

[11] Chandrashekar, J.; Hou, Y. Z.-l. Z. A unifying infrastructure forInternet services. In Communications, 2002. ICC 2002. IEEE Interna-tional Conference on (Apr. 2002), pp. 2641–2646.

[12] Clark, D., Braden, R., Falk, A., and Pingali, V. FARA: reor-ganizing the addressing architecture. In FDNA ’03: Proceedings of theACM SIGCOMM workshop on Future directions in network architecture(New York, NY, USA, 2003), ACM Press, pp. 313–321.

[13] Crowcroft, J., Hand, S., Mortier, R., Roscoe, T., and

Warfield, A. Plutarch: an argument for network pluralism. In FDNA’03: Proceedings of the ACM SIGCOMM workshop on Future direc-tions in network architecture (New York, NY, USA, 2003), ACM Press,pp. 258–266.

[14] Curran, J. An Internet transition plan. Internet-draft, Internet Engi-neering Task Force, Aug. 2007. Work in Progress. Available online athttp://www.ietf.org/internet-drafts/draft-jcurran-v6transitionplan-01.txt.

[15] Deering, S., and Hinden, R. Internet Protocol, version 6 (IPv6)specification. RFC 2460, Internet Engineering Task Force, Dec. 1998.Available online at http://www.rfc-editor.org/rfc/rfc2460.txt.

[16] Dike, J. User Mode Linux. Bruce Perens’ Open Source Series. PearsonEducation, Inc., 2006.

[17] Dixon, T. Comparison of proposals for next version of IP. RFC1454, Internet Engineering Task Force, May 1993. Available online athttp://www.rfc-editor.org/rfc/rfc1454.txt.

[18] Droms, R. Dynamic host configuration protocol. RFC 2131,Internet Engineering Task Force, Mar. 1997. Available online athttp://www.rfc-editor.org/rfc/rfc2131.txt.

[19] Egevang, K., and Francis, P. The IP network address translator(NAT). RFC 1631, Internet Engineering Task Force, May 1994. Avail-able online at http://www.rfc-editor.org/rfc/rfc1631.txt.



http://www.ietf.org/internet-drafts/draft-jcurran-v6transitionplan-01.txt





BIBLIOGRAPHY 73

[20] Feldmann, A. Internet clean-slate design: what and why? SIGCOMMComput. Commun. Rev. 37, 3 (2007), 59–64.

[21] Francis, P. Is the Internet going NUTSS? Internet computing 7, 6(2003), 94–96.

[22] Francis, P., and Gummadi, R. IPNL: A NAT-extended internetarchitecture. In SIGCOMM ’01: Proceedings of the 2001 conferenceon Applications, technologies, architectures, and protocols for computercommunications (New York, NY, USA, 2001), ACM Press, pp. 69–80.

[23] Fransson, P., and Jonsson, A. End-to-end measurements on per-formance penalties of IPv4 options. In Global Telecommunications Con-ference, 2004. GLOBECOM ’04. IEEE (Nov. 2004), pp. 1441–1447.

[24] Fuller, V., and Li, T. Classless Inter-Domain Routing (CIDR):The Internet address assignment and aggregation plan. RFC 4632,Internet Engineering Task Force, Aug. 2006. Available online athttp://www.rfc-editor.org/rfc/rfc4632.txt.

[25] Golding, D. Do we really need to have IPv6 when NAT conservesaddress space and aids security? Computer Weekly (Feb. 2006), 40–41.

[26] Guha, S., Takeda, Y., and Francis, P. NUTSS: a SIP-based ap-proach to UDP and TCP network connectivity. In FDNA ’04: Proceed-ings of the ACM SIGCOMM workshop on Future directions in networkarchitecture (New York, NY, USA, 2004), ACM Press, pp. 43–48.

[27] Hancock, R., Karagiannie, G., Loughney, J., and Van den

Bosch, S. Next Steps in Signaling (NSIS): Framework. RFC 4080,Internet Engineering Task Force, June 2005. Available online athttp://www.rfc-editor.org/rfc/rfc4080.txt.

[28] Huitema, C. Teredo: Tunneling IPv6 over UDP through Network Ad-dress Translations (NATs). RFC 4380, Internet Engineering Task Force,Feb. 2006. Available online at http://www.rfc-editor.org/rfc/rfc4380.txt.

[29] Latvala, M. System for combining networks of different addressingschemes. International patent publication WO 2007/093893 A1, NokiaCorp., Aug 2007.

[30] Love, R. Linux Kernel Development. Pearson Education, Inc., 2005.




BIBLIOGRAPHY 74

[31] Medina, A., Allman, M., and Floyd, S. Measuring the evolutionof transport protocols in the Internet. SIGCOMM Comput. Commun.Rev. 35, 2 (2005), 37–52.

[32] Moskowitz, R., and Nikander, P. Host Identity Protocol (HIP)architecture. RFC 4423, Internet Engineering Task Force, May 2006.Available online at http://www.rfc-editor.org/rfc/rfc4423.txt.

[33] Perkins, C. IP encapsulation within IP. RFC 2003, In-ternet Engineering Task Force, Oct. 1996. Available online athttp://www.rfc-editor.org/rfc/rfc2003.txt.

[34] Postel, J. User datagram protocol. RFC 768, Inter-net Engineering Task Force, Aug. 1980. Available online athttp://www.rfc-editor.org/rfc/rfc768.txt.

[35] Postel, J. Internet Control Message Protocol. RFC 792, In-ternet Engineering Task Force, Sept. 1981. Available online athttp://www.rfc-editor.org/rfc/rfc792.txt.

[36] Postel, J. Internet Protocol. RFC 791, Internet En-gineering Task Force, Sept. 1981. Available online athttp://www.rfc-editor.org/rfc/rfc791.txt.

[37] Postel, J. Transmission control protocol. RFC 793, Inter-net Engineering Task Force, Sept. 1981. Available online athttp://www.rfc-editor.org/rfc/rfc793.txt.

[38] Rekhter, Y., Moskowitz, B., Karrenberg, D., de, G., and

Lear, E. Address allocation for private internets. RFC 1918,Internet Engineering Task Force, Feb. 1996. Available online athttp://www.rfc-editor.org/rfc/rfc1918.txt.

[39] Rosenberg, J., Mahy, R., Matthews, P., and Wing, D. Ses-sion Traversal Utilities for NAT (STUN). Internet-draft, Internet Engi-neering Task Force, Nov. 2007. Work in Progress. Available online athttp://www.ietf.org/internet-drafts/draft-ietf-behave-rfc3489bis-13.txt.

[40] Rosenberg, J., Mahy, R., Matthews, P., and Wing, D. Traver-sal Using Relays around NAT (TURN): Relay extensions to SessionTraversal Utilities for NAT (STUN). Internet-draft, Internet Engineer-ing Task Force, Nov. 2007. Work in Progress. Available online athttp://www.ietf.org/internet-drafts/draft-ietf-behave-turn-05.txt.








http://www.ietf.org/internet-drafts/draft-ietf-behave-rfc3489bis-13.txt

http://www.ietf.org/internet-drafts/draft-ietf-behave-turn-05.txt

BIBLIOGRAPHY 75

[41] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,

Peterson, J., Sparks, R., Handley, M., and Schooler, E. SIP:session initiation protocol. RFC 3261, Internet Engineering Task Force,June 2002. Available online at http://www.rfc-editor.org/rfc/rfc3261.txt.

[42] Rosenberg, J., Weinberger, J., Huitema, C., and Mahy,

R. STUN - Simple Traversal of User Datagram Protocol (UDP)Through Network Address Translators (NATs). RFC 3489, In-ternet Engineering Task Force, Mar. 2003. Available online athttp://www.rfc-editor.org/rfc/rfc3489.txt.

[43] Shepherd, M. Charasteristics of switches and routers. WhitePaper, Juniper Networks Inc., 2006. Available online athttp://www.juniper.net/solutions/literature/white papers/200161.pdf.

[44] Srisuresh, P., and Egevang, K. Traditional IP Net-work Address Translator (traditional NAT). RFC 3022, Inter-net Engineering Task Force, Jan. 2001. Available online athttp://www.rfc-editor.org/rfc/rfc3022.txt.

[45] Srisuresh, P., and Holdrege, M. IP Network AddressTranslator (NAT) terminology and considerations. RFC 2663, In-ternet Engineering Task Force, Aug. 1999. Available online athttp://www.rfc-editor.org/rfc/rfc2663.txt.

[46] Srisuresh, P., Kuthan, J., Rosenberg, J., Molitor, A., and

Rayhan, A. Middlebox communication architecture and framework.RFC 3303, Internet Engineering Task Force, Aug. 2002. Available onlineat http://www.rfc-editor.org/rfc/rfc3303.txt.

[47] Stallman, R. Why ”Open Source” misses thepoint of Free Software, 2007. Available online athttp://www.gnu.org/philosophy/open-source-misses-the-point.html.

[48] Steinleitner, N., Peters, H., and Fu, X. Implementation andperformance study of a new NAT/firewall signaling protocol. In Dis-tributed Computing Systems Workshops, 2006. ICDCS Workshops 2006.26th IEEE International Conference on (July 2006).

[49] Stiemerling, M., Tschofenig, H., Aoun, C., and

Davies, E. NAT/Firewall NSIS Signaling Layer Pro-tocol (NSLP). Internet-Draft, Internet Engineering TaskForce, Nov. 2007. Work in Progress. Available online athttp://www.ietf.org/internet-drafts/draft-ietf-nsis-nslp-natfw-16.txt.



http://www.juniper.net/solutions/literature/white_papers/200161.pdf




http://www.gnu.org/philosophy/open-source-misses-the-point.html

http://www.ietf.org/internet-drafts/draft-ietf-nsis-nslp-natfw-16.txt

BIBLIOGRAPHY 76

[50] Turanyi, Z., and Valko, A. IPv4+4. In NetworkProtocols, 2002. Proceedings. 10th IEEE International Confer-ence on (Nov. 2002), pp. 290 – 299. Available online athttp://www.ieee-icnp.org/2002/papers/2002-26.pdf.

[51] Turanyi, Z., Valko, A., and Campbell, A. T. 4+4: an architec-ture for evolving the Internet address space back toward transparency.SIGCOMM Comput. Commun. Rev. 33, 5 (Oct. 2003), 43–54.

[52] Weiser, M. Whatever happened to the next-generation Internet?Commun. ACM 44, 9 (2001), 61–69.

http://www.ieee-icnp.org/2002/papers/2002-26.pdf

Appendices

77

APPENDIX A. RASR COMMUNICATIONS FLOW CHART 78

APPENDIX B. RASR IMPLEMENTATION FLOW CHART 79

APPENDIX C. LINUX KERNEL DEBUGGING USING USER MODE LINUX 80

User Mode Linux (UML) is a Linux virtualization method, allowing the Linuxkernel to be compiled and run as a user space program (i.e., Linux on top ofLinux). This appendix is devoted on Linux kernel debugging with the help ofUML. For readers who are interested in what else can be done with UML, wereally recommend the book “User Mode Linux” by J. Dike [16]. Also, checkout the project web page at http://user-mode-linux.sourceforge.net/.

As we are doing protocol development and thus require network connectivityfor testing, we have to do some preparations on the host (i.e., create thenecessary virtual network interface). This is done by running the followingcommands as root:

tunctl -u <user_running_uml>

chmod g+rw /dev/net/tun

The first command creates a virtual network interface, which is called tap0.Should the command be run again, a new interface would be created withevery run. We are assuming that you have the necessary support for thevirtual networking compiled within the host kernel, and that you have thetunctl utility.

Note that the user running UML requires read and write permission to/dev/net/tun. Here we assume that the user in question belongs to theroot group. Otherwise, some other permission set may be required.

GDB + UML

The GNU Debugger (GDB)1 is a commonly used application debugger. AsUML allows the Linux kernel to be run like a normal application, GDB canbe used to debug that as well.

Assuming that you have already configured (e.g., with make menuconfigARCH=um) and compiled (i.e., with make ARCH=um) the kernel, youshould now have an executable file linux within the kernel source directory.This file contains the compiled kernel.

To start the debugging process, first start the debugger with the commandgdb linux. Then, enter the following commands into the debugger:

handle SIGSEGV pass nostop noprint

handle SIGUSR1 pass nostop noprint

1http://www.gnu.org/software/gdb/gdb.html

http://user-mode-linux.sourceforge.net/

APPENDIX C. LINUX KERNEL DEBUGGING USING USER MODE LINUX 81

Now we should not receive too much garbage on the screen once we start thekernel. Next, we need to input the various arguments required by the kernel:

set args ubd0=<root_fs_image> eth0=tuntap,tap0 mem=256M

Here we define the location of the root file system image used by the UMLinstance, the virtual network interface2 and memory reserved for the UMLinstance.

Now the UML instance can be started with the command r (run).

After started, the UML instance acts like any other Linux system. To getback to the GDB prompt, send SIGINT to the first process after the gdb inthe process list3. After that, it’s normal GDB debugging. Some of the mostneeded commands are listed here, to get you started.

b(reakpoints) - set breakpoint

d(elete) <br> - delete breakpoint <br>

p(rint) - print values

c(ontinue) - continue

n(ext) - next instruction (next line, usually)

s(tep) - like n, except that it also enters subroutines

i(nfo) source - show information about the current source file

h(elp) <func> - show help for <func>

2eth0 on the UML instance, tap0 on the host machine3check with the ps command

APPENDIX D. THE SOCKET BUFFER (SK BUFF) KERNEL STRUCTURE 82

The following source code snippet is from file include/linux/skbuff.h ofthe Linux Kernel 2.6.18 (’vanilla’) source tree.

struct sk buff {/* These two members must be first. */struct sk buff *next;struct sk buff *prev;

struct sock *sk;struct skb timeval tstamp;struct net device *dev;struct net device *input dev;

10

union {struct tcphdr *th;struct udphdr *uh;struct icmphdr *icmph;struct igmphdr *igmph;struct iphdr *ipiph;struct ipv6hdr *ipv6h;unsigned char *raw;

} h;20

union {struct iphdr *iph;struct ipv6hdr *ipv6h;struct arphdr *arph;unsigned char *raw;

} nh;

union {unsigned char *raw;

} mac; 30

struct dst entry *dst;struct sec path *sp;

/** This is the control buffer. It is free to use for every* layer. Please put your private variables there. If you* want to keep them across layers you have to do a skb clone()* first. This is owned by whoever has the skb queued ATM.*/ 40

char cb[48];

unsigned int len,data len,

APPENDIX D. THE SOCKET BUFFER (SK BUFF) KERNEL STRUCTURE 83

mac len,csum;

u32 priority;u8 local df:1,

cloned:1,ip summed:2, 50

nohdr:1,nfctinfo:3;

u8 pkt type:3,fclone:2,ipvs property:1;

be16 protocol;

void (*destructor)(struct sk buff *skb);#ifdef CONFIG NETFILTER

struct nf conntrack *nfct; 60

#if defined(CONFIG NF CONNTRACK) | | defined(CONFIG NF CONNTRACK MODULE)struct sk buff *nfct reasm;

#endif

#ifdef CONFIG BRIDGE NETFILTERstruct nf bridge info *nf bridge;

#endif

u32 nfmark;#endif /* CONFIG NETFILTER */#ifdef CONFIG NET SCHED

u16 tc index; /* traffic control index */ 70

#ifdef CONFIG NET CLS ACTu16 tc verd; /* traffic control verdict */

#endif

#endif

#ifdef CONFIG NET DMAdma cookie t dma cookie;

#endif

#ifdef CONFIG NETWORK SECMARKu32 secmark;

#endif 80

/* These elements must be at the end, see alloc skb() for details. */unsigned int truesize;atomic t users;unsigned char *head,

*data,*tail,*end;

}; 90

APPENDIX E. COMPATIBILITY PATCH FOR THE ORIGINAL PING 84

diff -rup ping-orig/Makefile ping/Makefile

--- ping-orig/Makefile 2007-07-31 14:43:48.000000000 +0300

+++ ping/Makefile 2007-07-31 14:54:00.000000000 +0300

@@ -6,13 +6,13 @@

# installed. /usr/etc is a good place to put a network debugging tool such

# as this.

-DESTDIR= /usr/brl/sbin

-MANDIR= /usr/brl/man/man1

+DESTDIR= /bin/

+MANDIR= /usr/share/man/man1

# You shouldn’t need to change anything below this line.

-CC= cc

-CFLAGS = -O

+CC= gcc

+CFLAGS = -ggdb

# At the moment, the INCL variable isn’t really needed for anything.

INCL = -I.

@@ -20,8 +20,8 @@ LIBS =

# Script (or program) that returns the machine and os types,

# or just edit in the name yourself.

-MD=‘mdtype‘

-OS=‘ostype‘

+MD=‘uname -m‘

+OS=‘uname -o‘

# Explicitly define compiliation rule since SunOS 4’s make doesn’t like gcc.

# Also, gcc does not remove the .o before forking ’as’, which can be a

@@ -86,6 +86,3 @@ depend:

# DO NOT DELETE THIS LINE -- make depend uses it

-

-

-

diff -rup ping-orig/ping.c ping/ping.c

--- ping-orig/ping.c 2007-07-31 14:43:48.000000000 +0300

+++ ping/ping.c 2007-07-31 14:51:44.000000000 +0300

@@ -23,6 +23,10 @@

#include <stdio.h>

#include <errno.h>

+#include <signal.h> /* takes care of the SIGINT/SIGALARM issue */

+#include <stdlib.h> /* introduces EXIT codes */

+#include <strings.h> /* include some essential functions like bzero */

+#include <string.h> /* -"- strcpy */

#include <sys/time.h>

#include <sys/param.h>

@@ -178,8 +182,8 @@ char *argv[];

}

setlinebuf( stdout );

-signal( SIGINT, finish );

-signal(SIGALRM, catcher);

+ signal( SIGINT, (__sighandler_t)finish );

+ signal(SIGALRM, (__sighandler_t)catcher);

APPENDIX E. COMPATIBILITY PATCH FOR THE ORIGINAL PING 85

/* fire off them quickies */

for(i=0; i < preload; i++)

@@ -193,17 +197,19 @@ char *argv[];

int fromlen = sizeof (from);

int cc;

struct timeval timeout;

-int fdmask = 1 << s;

+ fd_set fdset;

+ FD_ZERO(&fdset);

+ FD_SET(1 << s,&fdset);

timeout.tv_sec = 0;

timeout.tv_usec = 10000;

if(pingflags & FLOOD) {

pinger();

-if( select(32, &fdmask, 0, 0, &timeout) == 0)

+ if( select(32, &fdset, 0, 0, &timeout) == 0)

continue;

}

-if ( (cc=recvfrom(s, packet, len, 0, &from, &fromlen)) < 0) {

+ if ( (cc=recvfrom(s, packet, len, 0, (struct sockaddr *)&from, &fromlen)) < 0) {

if( errno == EINTR )

continue;

perror("ping: recvfrom");

@@ -241,7 +247,7 @@ catcher()

waittime = 1;

} else

waittime = MAXWAIT;

-signal(SIGALRM, finish);

+ signal(SIGALRM, (__sighandler_t)finish);

alarm(waittime);

}

}

@@ -359,14 +365,14 @@ struct sockaddr_in *from;

if (cc < hlen + ICMP_MINLEN) {

if (pingflags & VERBOSE)

printf("packet too short (%d bytes) from %s\n", cc,

-inet_ntoa(ntohl(from->sin_addr))); /* DFM */

+ inet_ntoa(ntohl(from->sin_addr.s_addr))); /* DFM */

return;

}

cc -= hlen;

icp = (struct icmp *)(buf + hlen);

if( (!(pingflags & QUIET)) && icp->icmp_type != ICMP_ECHOREPLY ) {

printf("%d bytes from %s: icmp_type=%d (%s) icmp_code=%d\n",

- cc, inet_ntoa(ntohl(from->sin_addr)),

+ cc, inet_ntoa(ntohl(from->sin_addr.s_addr)),

icp->icmp_type, pr_type(icp->icmp_type), icp->icmp_code);/*DFM*/

if (pingflags & VERBOSE) {

for( i=0; i<12; i++)

APPENDIX F. RASR TESTBEDS – ROUTING TABLES 86

The kernel routing tables from each host in our RASR testbeds (see Fig-ures 5.1 and 5.2) are listed here. Lines starting with a star (*) are specificrouting rules necessary to do the tests without RASR.

Root RG (Testbed 1)

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

*192.168.2.2 10.0.1.2 255.255.255.255 UGH 0 0 0 eth1

192.168.0.2 0.0.0.0 255.255.255.255 UH 0 0 0 tap0

82.130.34.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0

10.0.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1

192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 tap0

0.0.0.0 82.130.34.254 0.0.0.0 UG 0 0 0 eth0

Root RG1 (Testbed 2)



192.168.0.2 0.0.0.0 255.255.255.255 UH 0 0 0 tap0

82.130.34.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0

10.0.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1

192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 tap0

0.0.0.0 82.130.34.254 0.0.0.0 UG 0 0 0 eth0




192.168.2.2 0.0.0.0 255.255.255.255 UH 0 0 0 tap0

62.236.206.0 0.0.0.0 255.255.255.240 U 0 0 0 eth0

192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 tap0

0.0.0.0 62.236.206.1 0.0.0.0 UG 0 0 0 eth0

RG1 (Testbed 1)



*192.168.2.2 192.168.1.2 255.255.255.255 UGH 0 0 0 eth0.0

10.0.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0.1

192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0.0

0.0.0.0 10.0.1.1 0.0.0.0 UG 0 0 0 eth0.1

APPENDIX F. RASR TESTBEDS – ROUTING TABLES 87

RG2 (Testbed 1)



192.168.2.2 0.0.0.0 255.255.255.255 UH 0 0 0 tap0

192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 tap0

192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0

0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth0

UM1 (Testbeds 1/2)



192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0

0.0.0.0 192.168.0.1 0.0.0.0 UG 0 0 0 eth0

UM2 (Testbeds 1/2)



192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0

0.0.0.0 192.168.2.1 0.0.0.0 UG 0 0 0 eth0

APPENDIX G. RASR TESTBEDS – NETFILTER SETTINGS 88

Netfilter1 is the Linux Firewall. It controls which packets may pass theTCP/IP stack. Furthermore, it provides NAT functionality for the system.

For our test scenario, we are not interested in the firewall functionality. Ac-tually, it is important that our test packets can traverse through the systemunhindered, so we keep packet filtering to the minimum (i.e., all packets areallowed). The only exception are our links to the Internet, the root RGs. Aswe pretty much have a direct connection to the Internet, we need to add someprotection. The relevant rules, printed using iptables -L -v command, arelisted next.

Root RG (Testbed 1) / Root RG1 (Testbed 2)

Chain INPUT (policy DROP 0 packets, 0 bytes)

pkts bytes target prot opt in out source destination

0 0 ACCEPT all -- eth0 any anywhere anywhere state RELATED,ESTABLISHED

0 0 ACCEPT all -- eth1 any anywhere anywhere

0 0 ACCEPT all -- tap0 any anywhere anywhere

0 0 ACCEPT all -- lo any anywhere anywhere

0 0 ACCEPT all -- eth0 any 62.236.206.0/28 anywhere

0 0 ACCEPT tcp -- any any anywhere anywhere tcp dpt:ssh \

flags:FIN,SYN,RST,ACK/SYN

Chain FORWARD (policy DROP 0 packets, 0 bytes)


0 0 ACCEPT all -- eth1 any anywhere anywhere



0 0 ACCEPT all -- eth0 any 62.236.206.0/28 anywhere

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)



Chain INPUT (policy DROP 0 packets, 0 bytes)




0 0 ACCEPT all -- lo any anywhere anywhere

0 0 ACCEPT all -- eth0 any 82.130.34.117 anywhere

0 0 ACCEPT tcp -- any any anywhere anywhere tcp dpt:ssh \

flags:FIN,SYN,RST,ACK/SYN

Chain FORWARD (policy DROP 0 packets, 0 bytes)




0 0 ACCEPT all -- eth0 any 82.130.34.117 anywhere



1http://www.netfilter.org/

http://www.netfilter.org/

APPENDIX G. RASR TESTBEDS – NETFILTER SETTINGS 89

RG1/RG2 (Testbed 1)/ UM1/UM2 (Testbeds 1/2)

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)


Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)




When RASR is enabled, the NAT functionality must be disabled as our testimplementation currently does not handle NAT properly. Furthermore, fortestbed 1, we managed to do our reference tests (i.e., tests where is RASRdisabled) with specific routing rules given in Appendix F. So, the NATfunctionality was disabled for each node. The command used to print thefollowing listing is iptables -t nat -L.

Chain PREROUTING (policy ACCEPT) target prot opt source destination

Chain POSTROUTING (policy ACCEPT)

target prot opt source destination

Chain OUTPUT (policy ACCEPT)

target prot opt source destination

In the case of testbed 2, routing rules are just not enough, as we are usingaddresses from the private IPv4 address spaces and such addresses are notrouted by most routers. So, NAT is needed. The normal masquerade rule(i.e., probably the most common NAT rule there is), swapping the sourceaddress from outgoing packets, is a good start. This is achieved with thefollowing command.

iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

It is not enough, though, as we want to access a service provided behind NAT(i.e., our virtual host on the receiving end). For that, a simple DestinationNAT (DNAT) rule is used, effectively forwarding all incoming traffic to thevirtual host by swapping the destination address from incoming packets. Therule is set with the following command.

iptables -t nat -A PREROUTING -i eth0 -j DNAT --to 192.168.2.2

APPENDIX H. REACHABILITY TEST RESULTS 90

This appendix contains reachability test results against 50 university webservers around the world. Values are average RTTs in milliseconds (ms). Inthe last column, RASR ping, ’[M]’ after an RTT value means that the serverin question mirrors RASR options in reply packets (i.e., copies the optionsfrom incoming packets to the reply packets).

server normal ping RASR pingUniv Buenos Aires (Argentina)www.uba.ar (157.92.44.2) - -Univ Melbourne (Australia)www.unimelb.edu.au (128.250.6.182) 314.541 -Univ Sydney (Australia)www.usyd.edu.au (129.78.64.24) 331.753 -Univ Vienna (Austria)www.univie.ac.at (131.130.1.78) 50.729 -Univ Libre Bruxelles (Belgium)www.ulb.ac.be (164.15.59.215) 42.788 -Univ Sao Paulo (Brazil)www.usp.br (143.107.254.11) - -Univ British Columbia (Canada)www.ubc.ca (64.40.111.228) 196.600 -Univ Toronto (Canada)www.utoronto.ca (142.150.210.13) 130.712 -Univ Chile (Chile)www.uchile.cl (200.89.70.188) 4503.318 -Univ Hong Kong (China - Hong Kong)www.hku.hk (147.8.145.43) - -Peking Univ (China)www.pku.edu.cn (162.105.129.12) - -Natl Taiwan Univ (China - Taiwan)www.ntu.edu.tw (140.112.8.130) 319.655 434.573Tsinghua Univ (China)www.tsinghua.edu.cn (211.151.91.165) 372.835 -Charles Univ Prague (Czech)www.cuni.cz (195.113.0.123) 41.372 -Univ Copenhagen (Denmark)www.ku.dk (130.225.126.132) - -Cairo Univ (Egypt)www.cu.edu.eg (195.246.45.241) - -


server normal ping RASR pingUniv Helsinki (Finland)www.helsinki.fi (128.214.205.16) 1.027 14.606Tech Univ Helsinki (Finland)www.tkk.fi (130.233.240.9) 2.447 12.316 [M]Univ Paris 06 (France)www.upmc.fr (134.157.146.23) - -Univ Munich (Germany)www.uni-muenchen.de (141.84.120.199) - -Tech Univ Munich (Germany)www.tu-muenchen.de (129.187.39.3) - -Univ Athens (Greece)www.uoa.gr (195.134.100.100) 84.358 132.707 [M]Eotvos Lorand Univ (Hungary)www.elte.hu (157.181.2.5) 54.037 -Indian Inst Sciwww.iisc.ernet.in (220.227.207.41) - -Trinity Coll Dublin (Ireland)www.tcd.ie (134.226.1.30) 65.477 -Hebrew Univ Jerusalem (Israel)www.huji.ac.il (132.64.3.20) - -Univ Milan (Italy)www.unimi.it (159.149.53.216) 53.011 73.986Kyoto Univ (Japan)www.kyoto-u.ac.jp (130.54.120.209) 314.146 -Tokyo Univ (Japan)www.u-tokyo.ac.jp (133.11.128.254) 293.456 311.743Univ Nacl Autonoma Mexico (Mexico)www.unam.mx (132.248.10.44) - -Univ Utrecht (Netherlands)www.uu.nl (131.211.16.222) 32.721 53.151Univ Auckland (New Zealand)www.auckland.ac.nz (130.216.11.202) 333.847 -Univ Oslo (Norway)www.uio.no (129.240.4.44) 15.402 29.969 [M]Jagiellonian Univ (Poland)www.uj.edu.pl (149.156.89.139) 78.633 94.282 [M]Moscow State Univ (Russia)www.msu.ru (193.232.113.129) 59.935 82.775Natl Univ Singapore (Singapore)www.nus.edu.sg (137.132.12.114) - -


server normal ping RASR pingUniv Pretoria (South Africa)www.up.ac.za (137.215.97.20) - -Seoul Natl Univ (South Korea)www.snu.ac.kr (147.46.10.48) - -Univ Barcelona (Spain)www.ub.es (161.116.100.2) - -Uppsala Univ (Sweden)www.uu.se (130.238.7.10) 11.087 20.702Univ Geneva (Switzerland)www.unige.ch (129.194.9.50) 45.811 -Univ Zurich (Switzerland)www.unizh.ch (130.60.127.166) 45.448 -Swiss Fed Inst Tech - Zurich (Switzerland)www.ethz.ch (129.132.46.11) - -Univ Istanbul (Turkey)www.istanbul.edu.tr (194.27.128.98) - -Univ Cambridge (UK)www.cam.ac.uk (131.111.8.46) 47.882 110.418 [M]Univ Edinburgh (UK)www.ed.ac.uk (129.215.13.199) 51.663 93.112 [M]Univ Oxford (UK)www.ox.ac.uk (163.1.13.189) 43.065 -Univ California - Berkeley (USA)www.berkeley.edu (169.229.131.92) 197.768 -Harvard Univ (USA)www.harvard.edu (128.103.60.28) 125.071 -Massachusetts Inst Tech (MIT) (USA)www.mit.edu (18.7.22.83) 124.443 186.969

AP

PE

ND

IXI.

SO

LU

TIO

NC

OM

PA

RIS

ON

TA

BLE

93

Solution Modified enti-

ties

New entities Identity Location Implemented Protocol

specific

IPv6 all - IPv6 addr. IPv6 addr. Yes Yes

IPNL Host and NAT - IPNLaddr.

IPNLaddr.

Yes Yes

IPv4+4 Host and NAT - IPv4+4addr.

IPv4+4addr.

Yes Yes

MIDCOM NAT MIDCOM agent IPv4 addr. IPv4 addr. No* No

NSIS Host and NAT - IPv4 addr. IPv4 addr. Yes No

Plutarch Host and NAT - context de-pendent

context de-pendent

No* No

RSIP Host and NAT - IPv4 addr. IPv4 addr. Yes Yes

Teredo Host Teredo Server &Relay

IPv4/6addr.

IPv4/6addr.

Yes Yes

M-FARA Host and NAT M-agent [12] No globalid

FD Yes No

HIP Host and NAT - HI IP addr. Yes Yes

Nimrod all - EID Locator No* No

NUTSS Host SIP & STUNTserver

SIP URI IP addr. Yes Yes

LNA Host, NAT, DNS - SID, EID NLA No* No

SON Host SG, S-PoP SID, OID SID, OID No* No

RASR Host, NAT, DNS - UA UA Yes No

* We are not aware of any working implementations for the solution(s) in question.

Documents

Realm Aware Source Routing in Practice -- Extending the