63
THE MAGAZINE OF GLOBAL INTERNETWORKING ® September/October 2008, Vol. 22, No. 5 A Publication of the IEEE Communications Society in cooperation with the IEEE Computer Society and the Internet Society ® ® www.comsoc.org Implications and Control of Middleboxes in the Internet

IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

  • Upload
    karmib

  • View
    134

  • Download
    4

Embed Size (px)

Citation preview

Page 1: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

THE MAGAZINE OF GLOBAL INTERNETWORKING

®September/October 2008, Vol. 22, No. 5

A Publication of the IEEE Communications Societyin cooperation with theIEEE Computer Society and theInternet Society

®

®

www.comsoc.org

Implications and Control of

Middleboxes in the Internet

Page 2: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 1

THE MAGAZINE OF GLOBAL INTERNETWORKING

®

SEPTEMBER/OCTOBER 2008, VOL. 22, NO. 5

IEEE NETWORK ISSN 0890-8044 is published bimonthly by the Institute of Electrical and Electronics Engineers, Inc. Headquarters address: IEEE, 3 Park Avenue, 17th Floor, New York, NY 10016-5997, USA; tel: +1-212-705-8900; e-mail: [email protected]. Responsibility for the contents rests upon authors of signed articles and not the IEEE or its members. Unless otherwise specified,the IEEE neither endorses nor sanctions any positions or actions espoused in IEEE Network.

ANNUAL SUBSCRIPTION: $40 in addition to IEEE Communications Society or any other IEEE Society member dues. Non-member prices: $250. Single copy price $50.EDITORIAL CORRESPONDENCE: Address to: Chatschik Bisdikian, Editor-in-Chief, IEEE Network, IEEE Communications Society, 3 Park Avenue, 17th Floor, New York, NY 10016-5997, USA; e-mail:[email protected]

COPYRIGHT AND REPRINT PERMISSIONS: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. Copyright law for private use of patrons:those articles that carry a code on the bottom of the first page provided the per copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA01923, USA. For other copying, reprint, or republication permission, write to Director, Publishing Services, at IEEE Headquarters. All rights reserved. Copyright ©2008 by the Institute of Electricaland Electronics Engineers, Inc.

POSTMASTER: Send address changes to IEEE Network, IEEE, 445 Hoes Lane, Piscataway, NJ 08855-1331, USA. Printed in USA. Periodical-class postage paid at New York, NY and at additionalmailing offices. Bulk rate postage paid at Easton, PA permit #7. Canadian GST Reg# 40030962. Return undeliverable Canadian addresses to: Frontier, P.O. Box 1051, 1031 Helena Street, Fort Eire,ON L2A 6C7.

SUBSCRIPTIONS, orders, address changes should be sent to IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08855-1331, USA. Tel. +1-732-981-0060.

ADVERTISING: Advertising is accepted at the discretion of the publisher. Address correspondence to IEEE Network, 3 Park Avenue, 17th Floor, New York, NY 10016-5997, USA.

8

14

20

26

A Retrospective View of NetworkAddress TranslationToday, network address translators, or NATs, are every-where. Their ubiquitous adoption was not promoted bydesign or planning but by the continued growth of theInternet.Lixia Zhang

Behavior and Classification of NATDevices and Implications for NATTraversalFor a long time, traditional client-server communicationwas the predominant communication paradigm of theInternet. Network address translation devices emergedto help with the limited availability of IP addresses andwere designed with the hypothesis of asymmetric con-nection establishment in mind. But with the growingsuccess of peer-to-peer applications, this assumption isno longer true. Andreas Müller, Georg Carle, and Andreas Klenk

Modeling MiddleboxesThe authors present a simple middlebox model that suc-cinctly describes how different middleboxes processpackets and illustrate it by representing four commonmiddleboxes.Dilip Joseph and Ion Stoica

Network Address Translation for theStream Control Transmission ProtocolThe authors discuss the deficiencies of using existingNAT methods for SCTP and describes a new SCTP-specif-ic NAT concept. This concept is analyzed in detail forseveral important network scenarios, including peer-to-peer, transport layer mobility, and multihoming.Michael Tüxen, Irene Rüngeler, Randall Stewart, and Erwin P. Rathgeb

Distributed Connectivity Service for a SIP InfrastructureThe authors present a distributed connectivity servicesolution that integrates relay functionality directly inuser nodes. Luigi Ciminiera, Guido Marchetto, Fulvio Risso, and Livio Torrero

Dial “M” for Middlebox Managed MobilityUsers can be served by multiple network-enabled terminaldevices, each of which in turn can have multiple networkinterfaces. This multihoming at both the user and devicelevel presents new opportunities for mobility handling. Stephen Herborn and Aruna Seneviratne

NAT Issues in the Remote Managementof Home Network DevicesThe authors focus on NAT issues in the management ofhome network devices. Specifically, they discuss effortsrelating to standardization.Choongul Park, Kitae Jeong, Sungil Kim, and Youngseok Lee

Improving the Performance of RouteControl Middleboxes in a CompetitiveEnvironmentThe authors show that by blending randomization withadaptive filtering techniques, it is possible to drasticallyreduce the interference between competing route con-trollers, and this can be achieved without penalizing theend-to-end traffic performance. Marcelo Yannuzzi, Xavi Masip-Bruin, Eva Marin-Tordera,Jordi Domingo-Pascual, Alexandre Fonte, and Edmundo Monteiro

33

41

48

56

Editor’s Note 2New Books & Multimedia 4

Guest Editorial 6

Special Issue

Implications and Control of Middleboxes in the InternetGuest Editors: Xiaoming Fu, Martin Stiemerling, and Henning Schulzrinne

LYT-TOC-SEPT 9/5/08 1:09 PM Page 1

Page 3: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

Director of MagazinesThomas F. La Porta, Penn. State Univ., USA

Editor-in-ChiefIoanis Nikolaidis, U. of Alberta, Canada

Associate Editor-in-ChiefChatschik Bisdikian, IBM Research, USA

Senior Technical EditorsThomas M. Chen, Swansea U., UK

Yi-Bing (Jason) Lin, National Chiao Tung Univ., TaiwanPeter O’Reilly, Northeastern Univ., USA

Technical Editors

Feature Editors

IEEE Production Staff

2008 IEEE Communications Society OfficersDoug Zuckerman, President

Andrzej Jajszczyk, VP–Technical ActivitiesMark Karol, VP–Conferences

Byeong Gi Lee, VP–Member RelationsSergio Benedetto, VP–Publications

Nim Cheung, Past PresidentStan Moyer, Treasurer

John M. Howell, Secretary

Board of GovernorsThe officers above plus Members-at-Large:

Class of 2008Thomas M. Chen, Andrea GoldsmithKhaled Ben-Letaief, Peter J. McLane

Class of 2009Thomas LaPorta, Theodore RappaportCatherine Rosenberg, Gordon Stuber

Class of 2010Fred Bauer, Victor Frost

Stefano Galli, Lajos Hanzo

2008 IEEE OfficersLewis M. Terman, PresidentJohn R. Vig, President-Elect

Barry L. Shoop, SecretaryDavid G. Green, Treasurer

Leah H. Jamieson, Past PresidentJeffry W. Raynes, Executive Director

Curtis A. Siller, Jr., Director, Division III

Joseph Milizzo, Assistant PublisherEric Levine, Associate Publisher

Susan Lange, Digital Production ManagerCatherine Kemelmacher, Associate EditorJennifer Porcello, Publications Coordinator

Devika Mittra, Publications Assistant

Olivier Bonaventure, "Software Tools for Networking"U. Catholique de Louvain, Belgium

Olivier Bonaventure, "New Books & Multimedia"U. Catholique de Louvain, Belgium

Kevin Almeroth, UCSB, USAN. Asokan, Nokia Res. Ctr., Finland

Olivier Bonaventure, U. Catholique de Louvain,Belgium

Adrian Conway, Verizon, USAJon Crowcroft, U. of Cambridge, UK

Christos Douligeris, U. of Piraeus, GreecePaolo Giacomazzi, Politecnico di Milano, Italy

David Greaves, U. of Cambridge, UKNikhil Jain, Qualcomm, USA

Admela Jukan, T. U. Braunschweig, GermanyTim King, BTexact Tech., UK

Frank Magee, Consultant, USAIoanis Nikolaidis, U. of Alberta, Canada

Georgios I. Papadimitriou, Aristotle Univ., GreeceMohammad Peyravian, IBM Corporation, USA

Kazem Sohraby, U. of Arkansas, USAJames Sterbenz, Univ. of Kansas, USA

Joe Touch, USC/ISI, USAVittorio Trecordi, CEFRIEL, Italy

Guoliang Xue, Arizona State Univ., USARaj Yavatkar, Intel, USA

Bulent Yener, Rensselaer Polytechnic Institute, USA

IEEE Network • September/October 20082

ear readers, welcome to the September 2008 issue of IEEE Network.The sound of trucks, the heavy duty disposal bins on the curb,

and the thud and bang of construction are all elements of a “quiet” sum-mer, full of renovations, in my neighborhood. Resisting this Siren’s call isdifficult even if I swore off any renovations for the rest of my life, givenpast experience. I naively thought that this time it wouldn’t be that bad.After all, this time it looks like a much smaller job than last time. Ofcourse, I neglected a key conservation law: if a job is small, the additionaldelays for various reasons will expand it to be roughly equal the total timeof a “big” job (more professionally managed one might argue, and hencewith much less slack). What I was not prepared to experience is the shift inattitudes caused by the widespread adoption of many “information” appli-ances in today’s household.

I should have spotted the shift when my contractor warned that he wouldneed to turn off the power to our house, only to qualify it with “If that’sokay with your gear, right?” noticing that there were maybe a tad too manydevices, computers, firewalls, servers, and bridges spread around the house.He was concerned that some might develop bad hiccups after the switchwas turned off and on again. He had experienced himself some “unhealthy”side effects to his equipment under similar circumstances, so his concernwas genuine. I thought for a moment of explaining the benefits of stateless-ness and how, I would hope, most of my gear could survive power being cutand restored later (no I don’t have a UPS — I believe in luck). I decidednot to expand on the topic, just agreeing that it was okay to cut the powerto the house.

Things indeed went as planned, although it should have struck me as oddthat he did not ask about other things that might be influenced by cutting thepower. A few days later, while I was at work, the contractor stumbled on adilemma. He had to run a industrial strength vacuum cleaner to pick up lotsof debris. Having pulled down walls and removed several wall outlets left himwith no choice but to run an extension cord to the nearest outlet he couldfind still standing. It happened that this was an outlet already fully populatedby two cords, one connecting a refrigerator we keep in the basement, and oneconnecting a NAT/firewall box, a nearby server, and a cable modem. Withoutany hesitation, he removed the one least likely to create a hassle: the refrig-erator!

In comparison to a NAT box, a refrigerator is low tech and almost stateless— if not its volatile contents. His choice was reasonable. He was not expect-ing to keep it for more than an hour this way. But human nature conspired.The contractor forgot to plug in the refrigerator when he was done. Thepackets were running smoothly while our frozen veggies were thawing. To

Ioanis Nikolaidis

DD

THE MAGAZINE OF GLOBAL INTERNETWORKING

®

®

NATs and Frozen Veggies

EDITOR’S NOTE

LYT-EDIT_NOTE-SEPTEMBER 9/5/08 1:06 PM Page 2

Page 4: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

3IEEE Network • September/October 2008

make matters worse, and blame my own human naturehere, I did not notice the “failure” until late in theevening (okay, so I do keep some beer there too). Had itbeen the firewall malfunctioning I would have spotted itin minutes. I spent a good part of the evening decidingwhat had to be thrown away and what to keep (luckilythis was not a warm day) and laughing at our priorities:mine and the contractor’s.

The fact that everyday people think of consumer-gradenetworking and information appliances as possibly themost sensitive objects in a house reflects what they havelearned from their own experience in the recent past.After all, a lost file can be a major blow, while a poundof rotten spinach is, well, compost. A handful of remark-able technologies made it into these everyday devices,and one that is still a topic of research, extension, andoverall controversy is Network Address Translation(NAT). NAT is no longer just a way to establish a homeuser’s little kingdom of an Internet-connected privatenetwork (while guilt-free of hoarding IP addresses). NATboxes are increasingly active participants as the ‘’middle-point’’ of communication paths and this has led to theuse of a new term, “middlebox,” to describe the particu-lar class of technologies.

This special issue, entitled “Implications and Controlof Middleboxes in the Internet,” provides a timely

review of where we are in middlebox evolution andhow they might further evolve. I would like to thankthe guest editors, Xiaoming Fu, Martin Stiemerling,and Henning Schulzrinne, as well as the liaison editorof this issue, Jon Crowcroft, for their excellent work inputting this issue together. I would also to welcome anew member to our editorial board: Dr. Admela Jukan.Dr. Jukan received her Ph.D. degree from Vienna Uni-versity of Technology in Austria, and is currently a W3Professor of Electrical and Computer Engineering atthe Technical Univers i ty Carolo-Wilhelmina ofBrunswick (Braunschweig), Germany. Dr. Jukan servedbetween 2002 and 2004 as Program Director in Com-puter and Networks System Research at the NationalScience Foundation (NSF), responsible for funding andcoordinating US-wide university research and educa-tion activities in the area of network technologies andsystems.

As always, your feedback regarding the direction andsubstance of the magazine is invaluable and alwaysappreciated. Please contact me, by e-mail , [email protected], to let me know what you thinkabout the editorial comments, what type of contentmight be more interesting to you, and in what ways themagazine’s distinct character could be improved or fur-ther publicized.

EDITOR’S NOTE

LYT-EDIT_NOTE-SEPTEMBER 9/5/08 1:06 PM Page 3

Page 5: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 20084

LAN Switch Security : WhatHackers Know About YourSwitchesEric Vyncke and Christopher Pagen,Cisco Press, 2008, ISBN-10: 1-58705-256-3, Softbound, 360 pagesEthernet is now the default fixed local areanetwork technology. Ethernet LANs arefound in all enterprise environments, andin more and more home networks. Ether-net was designed in the 1970s when securi-ty was not a concern. Since then, Ethernethas evolved with the introduction of hubsand switches. Many network administra-tors are aware that hubs are a security con-cern since they broadcast Ethernet frames,and some of them assume that switches aremore secure. Unfortunately, hackers havelearned the limitations of Ethernet switch-es and have developed several tools thatcan be used to exploit them.

This book describes the current state ofthe art in securing Ethernet switches. Theauthors take a practical approach by usingdifferent types of Cisco switches and freelyavailable tools to demonstrate the securityproblems and their solutions. Despite itsfocus on a single vendor, this book is aninteresting reference for system adminis-trators who are willing to better under-stand how to secure their Ethernetnetworks. This is particularly important inenvironments such as schools were uncon-trolled laptops are often connected.

The first part discusses the basic secu-rity problems that affect Ethernet switch-es: the learning bridge process and theimplications of the limited size of theMAC table on Ethernet switches. It alsodiscusses configurations to mitigate theseproblems. Then the book analyzes sever-al protocols and their security implica-tions: the spanning tree protocol, the802.1q VLANs, DHCP, IPv4 ARP, andIPv6 Neighbor Discovery, but also sur-prising electrical security issues with

power over Ethernet. The second partfocuses on techniques can that be usedon switches to sustain denial-of-serviceattacks, from both forwarding and con-trol plane viewpoints. The last part ana-lyzes recent techniques that can be usedto improve the security of Ethernetswitches, such as 802.1x or 802.1AE andaccess control lists.

Principles of Protocol DesignRobin Sharp, Springer Verlag, 2008,ISBN: 978-3-540-77540-9, Hard-bound, 402 pagesThis book takes an unusual path todescribe computer network protocols.While most standard networking textsmainly focus on a textual description ofthe different protocols and mechanisms,Robin Sharp starts from formal descrip-tion techniques. More precisely, he choos-es the Communicating SequentialProcesses (CSP) notation proposed byHoare. CSP is a process algebra thatallows to model the interactions amongcommunicating processes. The book startswith a detailed description of CSP andthen uses the CSP formalism to describeseveral mechanisms such as flow and errorcontrol, fault-tolerant broadcast, and two-phase commits. An advantage of usingCSP is that the book contains proofs ofseveral of the described mechanisms.However, as CSP does not contain com-plex data types, it is difficult to completelymodel complex protocols in detail. Sur-prisingly, the author did not considermore powerful formal description tech-niques that evolved from CSP such asLOTOS.

The second part of the book is moreheterogeneous. Several security protocolsare discussed, and the BAN logic is intro-duced. Then the author briefly discussesreal protocols. The discussion considersboth open system interconnection (OSI)protocols and Internet protocols. This part

is less interesting than the first part, wherethe CSP models could be of interest toreaders who are more interested in theapplication of formal description tech-niques to network protocols.

Patterns in Network Architec-ture : A Return to Fundamen-talsJohn Day, Prentice Hall, 2008, ISBN-10: 0132252422, Hardbound, 464pages

The architecture of today’s Internet wasmainly designed together with the TCPand IP protocols in the 1970s and early1980s. During the last years, researchersand funding organizations in America,Europe, and Asia have started to work ondifferent alternative architectures for theInternet. Some consider an evolutionaryapproach where the Internet architecturewould be incrementally modified in abackward compatible manner, while oth-ers believe a completely new architectureshould be developed to take into accountthe requirements of today’s and tomor-row’s Internet.

John Day’s book is a must read forresearchers interested in the evolution ofthe Internet architecture. The book iscomposed of two main parts. The first partis mainly a history of the evolution ofcomputer network architectures in the1970s and 1980s. John Day participatedactively in this research on both the Inter-net side and the OSI side. He explains thereasons for some of the design choices anddiscusses alternatives that were consideredbut not selected. The discussion considersseveral of the key elements of a computernetwork architecture, including the proto-col elements, layering, naming, andaddressing.

The second part describes John Day’svision of an alternative network architec-ture. For this, he starts by reconsideringnetwork-based InterProcess Communica-tion (IPC) and shows that a distributedIPC should be at the core of a computernetwork architecture. This discussion isinteresting, but the author does explain indetail how it could be realized in practice.The second part ends with two chapterson topological addressing influenced byMike O’Dell’s GSE proposal, and a dis-cussion of the impact of multicast andmultihoming on the architecture.

NEW BOOKS AND MULTIMEDIA/EDITED BY OLIVIER BONAVENTURE

The New Books and Multimedia column contains brief reviews of new books in thecomputer communications field. Each review includes a highly abstracted descriptionof the contents, relying on the publisher’s descriptive materials, minus advertisingsuperlatives, and checked for accuracy against a copy of the book. The reviews alsocomment on the structure and the target audience of each book. Publishers wishing tohave their books listed in this manner should contact Olivier Bonaventure by email.

Olivier BonaventureUniversité Catholique de Louvain, Belgium

[email protected]

LYT-NEWBOOKS-SEPTEMBER 9/5/08 1:06 PM Page 4

Page 6: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 20086

GUEST EDITORIAL

iddleboxes in the Internet have been explored,sometimes quite controversially, in operations,standardization, and the research community formore than 10 years. The main concern in the

past has been their contradicting nature to the Internet’s end-to-end principle. In the past, many have expressed concernsthat middleboxes contradict the Internet's end-to-end principlethat is often understood to posit that "intelligence" is placed inend system and network elements just forward packets. Mid-dleboxes introduce functions beyond forwarding in the datapath between a source and destination, as described, for exam-ple, in RFC 3234. RFC 3234 describes a wide range of middleboxes, from TCP performance enhancing proxies totranscoders.

On the other hand, middleboxes were introduced in theInternet for various reasons: NATs intend to decouple theinternal IP addressing from the public address space whileallowing multiple hosts to share a single public IP address, forthe purpose of preserving the IP address space; firewalls areused for administrators to enforce policies on the data trafficat administrative borders with the intention of preventing theirnetworks from being attacked or monitored; application levelgateways (ALGs) are typically used to assist applications intheir operations.

The implications of the emergence and popularity ofmiddleboxes are complicated. With middleboxes it is diffi-cult to even provide basic end-to-end connectivity for manyapplications. For example, Internet hosts behind NATs canonly initiate a TCP connection with another host, but can-not accept a connection request. Unlike in the past, whenthe vast majority of applications followed the client-serverdesign pattern, and most hosts behind NATs were clientsanyway (e.g., your browser accessing a Web server), a vari-ety of new applications today, such as voice-over-IP, gam-ing, and peer-to-peer file sharing cause an enormous list ofissues. Hosts behind NATs are not reachable from anyother host anymore, which become particularly troublesomefor VoIP and other peer-to-peer applications. Likewise,firewalls are usually statically configured to block certainTCP ports or do not understand non-TCP protocols, mak-ing it difficult to deploy new applications and protocols.This results in a number of issues to be considered in thedesign and development of new protocols and applications.

To mitigate the negative impacts of these issues, quite anumber of techniques have been developed, which can be cat-

egorized as explicit control and implicit control of firewallsand NATs. For explicit control, an entity, either the end hostor a proxy in the network, has a relationship with the middle-box and controls its behavior (e.g., the set of policies or filterrules loaded). Examples of explicit control are universal plugand play (UPnP), Internet Engineering Task Force (IETF)Middlebox Communications (MIDCOM), and IETF NextSteps in Signaling (NSIS). On the other hand, implicit controlis the traditional way of traversing middleboxes. Implicit con-trol does not have any control relationship with the middlebox,because end hosts, probably with the support of other endhosts, are using hole punching techniques to get a workingmiddlebox traversal. Examples of implicit control are theIETF’s Session Traversal Utilities for NAT (STUN), TraversalUsing Relays around NAT (TURN), and Interactive Connec-tivity Establishment (ICE). In addition, there have been somerecent attempts to design or use certain types of middleboxes,such as various application proxies.

In this special issue we are pleased to introduce a series ofstate-of-the-art articles on this specific area. These articlescover the subject from a variety of perspectives, offering thereaders an understanding of the issues and implications of var-ious middleboxes in the Internet, including their control mech-anisms. A total of eigh articles, selected from 26 submissionsbased on a strict peer review process, cover a broad range inthe field of implications and control of middleboxes in theInternet. While some articles present more general issues withmiddleboxes, understanding their behaviors and implications,others focus on new approaches to controlling and usiing mid-dleboxes.

NATs, an unplanned reality, have posed complications tothe Internet architecture and applications. The first article, “ARetrospective View of NAT” by Lixia Zhang, takes readersback to the early days of middleboxes. It gives a historic reviewof NATs and the lessons learned, including how they impededstandardization and deployment of IPv6, and an expected solu-tion for addressing the Internet address depletion problem.Without a timely standardization of NAT, today there havebeen a number of different NAT implementations, and it isvital to understand their behaviors due to their nearly ubiqui-tous presence.

The second article, “Behavior and Classification of NATDevices and Implications for NAT Traversal” by AndreasMüller, Andreas Klenk, and Georg Carle, provides a compre-hensive overview of NAT behaviors and currently available

Implications and Control of Middleboxes in the Internet

Xiaoming Fu

MM

Martin Stiemerling Henning Schulzrinne

LYT-GSTEDIT-SEPTEMBER 9/9/08 12:53 PM Page 6

Page 7: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 7

NAT traversal techniques. The article presents a new catego-rization approach based on an analytical abstraction of NATtraversal, which classifies NAT traversal services into four dis-tinct types and deduces the corresponding NAT behaviors.This may help developers of new protocols and applications todetermine applicable techniques for NAT traversal.

While the first two articles describe the history, behavior,and classification of NAT, the next article by Dilip Joseph andIon Stoica, “Modeling Middleboxes,” proposes a formal andgeneric model for deducing middlebox functionalities andbehaviors. Using this model, the article illustrates how differ-ent middleboxes process packets, and how four common mid-dleboxes — firewall, NAT, layer 4, and layer 7 load balancers— may be depicted. As such, the article provides an initial stepfor relevant designers, users, and researchers to understandand refine the behaviors and implications of various middle-boxes.

Existing middleboxes mostly consider TCP and UDP intheir implementations, and typically do not support other pro-tocols, such as the Stream Control Transmission Protocol(SCTP). In the fourth article, Michael Tüxen et al. describe theextensions required to support NAT for SCTP. The analysispresented in this article may be useful as a general lesson inthe near future, as several other protocols after SCTP, includ-ing DCCP, XCP, and HIP, use similar techniques such as mul-tihoming, rehoming, and handshake cookies.

Applications using the Session Initialization Protocol (SIP)or peer-to-peer way of operation (P2PSIP or just normal P2Papplications) are among those that suffer most from the mid-dlebox traversal issue. The fifth article, “Distributed Connec-tivity Service for a SIP Infrastructure” by Luigi Ciminiera etal., examines this issue and presents an alternative approach tothe current STUN/TURN/ICE approach to middlebox traver-sal. The approach distributes the rendezvous and relay func-tions among SIP user agents, which discover their peersautonomously and maintain a P2P overlay to ensure connectiv-ity across NATs and firewalls in a SIP infrastructure withoutrelying on a centralized server.

The remaining three articles address new applications of mid-dleboxes. The sixth article, “Dial M for Middlebox ManagedMobility” by Stephen Herborn and Aruna Seneviratne,describes a new usage type of middleboxes for mobility supportvia the concept of virtual private “personal networks.” Such anetwork is created and maintained by way of HIP combinedwith IPsec and supported by middlebox state drop "(at least tosome extent)" plus middlebox state, which may be interesting (atleast to some extent) for the recent research efforts on networkvirtualization, as they use today’s technologies directly.

An increasing number of home users today are using NATsto connect their home IP devices with the Internet. ChoongulPark et al. discuss this issue in their article “Issues in theRemote Management of Home Network Devices.” By extend-ing SNMP and using additional management objects (MOs) togather NAT binding information, the authors attempt toaddress the NAT traversal problem under a symmetric NAT,based on their observations in Korea. While the success rate ofNAT traversal could be a potential issue outside Korea, thearticle provides an insight of what home networking standardsmay have to deal with.

Yet another type of middlebox function, intelligent routecontrol (IRC) for multihomed sites and subscribers, has beenrecently identified as a key issue in efficient network opera-tions. The final article, “Improving the Performance of RouteControl Middleboxes in a Competitive Environment” byMarcelo Yannuzzi et al., addresses this issue and introduces anIRC approach for competitive environments, by blending ran-domization with adaptive filtering techniques.

We hope that these articles will help to clarify and explainthe state-of-the-art advances on middlebox issues in the Inter-net, providing current visions of how the behaviors, implica-tions, and control of middlboxes may be analyzed,encompassed, and utilized. In preparing this special issue, wewish to thank all the peer reviewers for their efforts in careful-ly reviewing the manuscripts to meet the tight deadlines. Weare grateful to our liaison editor Jon Crowcroft for his con-structive feedbacks, and Editor-in-Chief Ioanis Nikolaidis forhis timely and critical suggestions.

BiographiesXIAOMING FU [M’02] ([email protected]) received his Ph.D. degree incomputer science from Tsinghua University, Beijing, China, in 2000. Afteralmost two years of postdoctoral work at Technical University Berlin, he joinedthe University of Göttingen as an assistant professor, leading a team working onnetworking research. Since April 2007 he has been a professor and head of theComputer Networks Group at the University of Göttingen. During 2003–2005he also served as an expert on the ETSI Specialist Task Forces on Internet Proto-col Testing; he was also a visiting scientist at the University of Cambridge and ColumbiaUniversity. In the research fields of architectures, protocols, and applications forQoS, firewalls, p2p overlay, and mobile networking as well as related security issues,he (co-)authored more than 50 referred papers as well as several RFCs/I-Ds. Hehas served as TPC member and session chair for several conferences, includingIEEE INFOCOM, ICNP, ICDCS, GLOBECOM, and ICC. He was also foundingchair of the ACM Workshop on Mobility in the Evolving Internet Architecture (MobiArch)and is TPC Co-Chair of IEEE GLOBECOM 2009 Next Generation Networkingand Internet Symposium. He is currently a member of the editorial board ofComputer Communications Journal (Elsevier).

MARTIN STIEMERLING [M’00] ([email protected]) received hisM.Sc. degree (Diploma) in electrical eengineering with a focus on IP networkingtechnologies from the Polytechnic University of Applied Sciences in Cologne in 2000.After that he joined the NEC Laboratories Europe, Heidelberg, Germany, wherehe is currently a senior researcher. His areas of research interest are Internetarchitecture, Internet signaling protocols, network management, and overlay/peer-to-peer systems. He has published several papers in these areas, andserved as a TPC member of IEEE IPOM 2007. In the IETF he is active as workingdocument editor in the MIDCOM, MMUSIC, and NSIS working groups, as wellas in other IETF working groups and IRTF research groups. He is co-chair of theIETF Next Steps in Signaling (NSIS) working group, and secretary of the IP overDVB (IPDVB) working group, and a co-author of RFC 3816, RFC 3989, and RFC4540, as well as RTSPng.

HENNING SCHULZRINNE [F’06] ([email protected]) received his Ph.D. from theUniversity of Massachusetts in Amherst, Massachusetts. He was a member oftechnical staff at AT&T Bell Laboratories, Murray Hill, New Jersey, and an asso-ciate department head at GMD-Fokus (Berlin) before joining the Computer Sci-ence and Electrical Engineering Departments at Columbia University, New York.He is currently a professor and chair of the Department of Computer Science.He has been a member of the Board of Governors of the IEEE CommunicationsSociety and is vice chair of ACM SIGCOMM, former chair of the IEEE Commu-nications Society Technical Committees on Computer Communications and the Inter-net , has been technical program chair of Global In ternet , INFOCOM,NOSSDAV, and IPTCOMM, and was General Chair of ACM Multimedia 2004.He has also been a member of the Internet Architecture Board. Protocols co-developed by him, such as RTP, RTSP, and SIP, are now Internet standards, usedby almost all Internet telephony and multimedia applications. His research inter-ests include Internet multimedia systems, ubiquitous computing, mobile systems, qual-ity of service, and performance evaluation.

GUEST EDITORIAL

LYT-GSTEDIT-SEPTEMBER 9/9/08 12:53 PM Page 7

Page 8: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 20088 0890-8044/08/$25.00 © 2008 IEEE

network address translator (NAT) commonlyrefers to a box that interconnects a local networkto the public Internet, where the local networkruns on a block of private IPv4 addresses as spec-

ified in RFC 1918 [1]. In the original design of the Internetarchitecture, each IP address was defined to be globallyunique and globally reachable. In contrast, a private IPv4address is meaningful only within the scope of the local net-work behind a NAT and, as such, the same private addressblock can be reused in multiple local networks, as long asthose networks do not directly talk to each other. Instead,they communicate with each other and with the rest of Inter-net through NAT boxes.

Like most unexpected successes, the ubiquitous adoption ofNATs was not foreseen when the idea first emerged morethan 15 years ago [2, 3]. Had anyone foreseen where NATwould be today, it is possible that NAT deployment mighthave followed a different path, one that was better plannedand standardized. The set of Internet protocols that weredeveloped over the past 15 years also might have evolved dif-ferently by taking into account the existence of NATs, and wemight have seen less overall complexity in the Internet com-pared to what we have today.

Although the clock cannot be turned back, I believe it is aworthwhile exercise to revisit the history of network addresstranslation to learn some useful lessons. It also can be worth-while to assess, or reassess, the pros and cons of NATs, aswell as to take a look at where we are today in our under-standing of NATs and how best to proceed in the future.

It is worth pointing out that in recent years many effortswere devoted to the development and deployment of NATtraversal solutions, such as simple traversal of UDP throughNAT (STUN) [4], traversal using relay NAT (TURN) [5], andTeredo [6], to name a few. These solutions remove obstaclesintroduced by NATs to enable an increasing number of newapplication deployments. However, as the title suggested, thisarticle focuses on examining the lessons that we can learnfrom the NAT deployment experience; a comprehensive sur-vey of NAT traversal solutions must be reserved for a sepa-rate article.

I also emphasize that this writing represents a personalview, and my recall of history is likely to be incomplete and tocontain errors. My personal view on this subject has alsochanged over time, and it may continue to evolve, as we areall in a continuing process of understanding the fascinatingand dynamically changing Internet.

How a NAT WorksAs mentioned previously, IP addresses originally weredesigned to be globally unique and globally reachable. Thisproperty of the IP address is a fundamental building blockin supporting the end-to-end architecture of the Internet.Until recently, almost all of the Internet protocol designs,especially those below the application layer, were based onthe aforementioned IP address model. However, the explo-sive growth of the Internet during the 1990s not only sig-naled the danger of IP address space exhaustion, but alsocreated an instant demand on IP addresses: suddenly, con-necting large numbers of user networks and home comput-ers demanded IP addresses instantly and in large quantities.Such demand could not possibly be met by going throughthe regular IP address allocation process. Network addresstranslation came into play to meet this instant high demand,and NAT products were quickly developed to meet the mar-ket demand.

However, because NATs were not standardized beforetheir wide deployment, a number of different NAT productsexist today, each with somewhat different functionality anddifferent technical details. Because this article is about thehistory of NAT deployment — and not an examination of howto traverse various different NAT boxes — I briefly describe apopular NAT implementation as an illustrative example.Interested readers can visit Wikipedia to find out more aboutexisting types of NAT products.

A NAT box N has a public IP address for its interfaceconnecting to the global Internet and a private address fac-ing the internal network. N serves as the default router forall of the destinations that are outside the local NAT addressblock. When an internal host H sends an IP packet P to a

AA

Lixia Zhang, University of California, Los Angeles

AbstractToday, network address translators, or NATs, are everywhere. Their ubiquitousadoption was not promoted by design or planning but by the continued growth ofthe Internet, which places an ever-increasing demand not only on IP address spacebut also on other functional requirements that network address translation is per-ceived to facilitate. This article presents a personal perspective on the history ofNATs, their pros and cons in a retrospective light, and the lessons we can learnfrom the NAT experience.

A Retrospective View of Network Address Translation

ZHANG LAYOUT 9/5/08 1:03 PM Page 8

Page 9: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 9

public IP destination address D located in the global Inter-net, the packet is routed to N . N translates the privatesource IP address in P’s header to N’s public IP address andadds an entry to its internal table that keeps track of themapping between the internal host and the outgoing packet.This entry represents a piece of state, which enables subse-quent packet exchanges between H and D. For example,when D sends a packet P’ in response to P, P’ arrives at N,and N can find the corresponding entry from its mappingtable and replace the destination IP address — which is itsown public IP address — with the real destination addressH, so that P’ will be delivered to H. The mapping entry timesout after a certain period of idleness that is typically set to avendor-specific value. In the process of changing the IPaddress carried in the IP header of each passing packet, aNAT box also must recalculate the IP header checksum, aswell as the checksum of the transport protocol if it is calcu-lated based on the IP address, as is the case for Transmis-sion Control Protocol (TCP) and User Datagram Protocol(UDP) checksums.

From this brief description, it is easy to see the major bene-fit of a NAT: one can connect a large number of hosts to theglobal Internet by using a single public IP address. A numberof other benefits of NATs also became clear over time, whichI will discuss in more detail later.

At the same time, a number of drawbacks to NATs alsocan be identified immediately. First and foremost, the NATchanged the end-to-end communication model of the Inter-net architecture in a fundamental way: instead of allowingany host to talk directly to any other host on the Internet, thehosts behind a NAT must go through the NAT to reach oth-ers, and all communications through a NAT box must be ini-tiated by an internal host to set up the mapping entries onthe NAT. In addition, because ongoing data exchangedepends on the mapping entry kept at the NAT box, the boxrepresents a single point of failure: if the NAT box crashes, itcould lose all the existing state, and the data exchangebetween all of the internal and external hosts must be restart-ed. This is in contrast to the original goal of IP of deliveringpackets to their destinations, as long as any physical connec-tivity exists between the source and destination hosts. Fur-thermore, because a NAT alters the IP addresses carried in apacket, all protocols that are dependent on IP addresses areaffected. In certain cases, such as TCP checksum, whichincludes IP addresses in the calculation, the NAT box canhide the address change by recalculating the TCP checksumwhen forwarding a packet. For some of the other protocolsthat make direct use of IP addresses, such as IPSec [7], theprotocols can no longer operate on the end-to-end basis asoriginally designed; for some application protocols, for exam-ple, File Transfer Protocol (FTP) [8], that embed IP address-es in the application data, application-level gateways arerequired to handle the IP address rewrite. As discussed later,NAT also introduced other drawbacks that surfaced onlyrecently.

A Recall of the History of NATsI started my Ph.D. studies in the networking area at the Mas-sachusetts Institute of Technology at the same time as RFC791 [9], the Internet Protocol Specification, was published inSeptember 1981. Thus I was fortunate to witness the fascinat-ing unfolding of this new system called the Internet. Duringthe next ten years, the Internet grew rapidly. RFC 1287 [2],Towards the Future Internet Architecture, was published in 1991and was probably the first RFC that raised a concern about IPaddress space exhaustion in the foreseeable future.

RFC 1287 also discussed three possible directions to extendIP address space. The first one pointed to a direction similarto current NATs:

Replace the 32-bit field with a field of the same size but with adifferent meaning. Instead of being globally unique, it would beunique only within some smaller region. Gateways on the bound-ary would rewrite the address as the packet crossed the boundary.

RFC 1335 [3], published shortly after RFC 1287, provideda more elaborate description of the use of internal IP address-es (i.e., private IP addresses) as a solution to IP addressexhaustion. The first article describing the NAT idea, “Extend-ing the IP Internet through Address Reuse” [10], appeared inthe January 1993 issue of ACM Computer CommunicationReview and was published a year later as RFC 1631 [11].Although these RFCs can be considered forerunners in thedevelopment of NAT, as explained later, for various reasonsthe IETF did not take action to standardize NAT.

The invention of the Web further accelerated Internetgrowth in the early 1990s. The explosive growth underlinedthe urgency to take action toward solving both the routingscalability and the address shortage problems. The IETF tookseveral follow-up steps, which eventually led to the launch ofthe IPng development effort. I believe that the expectation atthe time was to develop a new IP within a few years, followedby a quick deployment. However, the actual deployment dur-ing the next ten years took a rather unexpected path.

The Planned SolutionAs pointed out in RFC 1287, the continued growth of theInternet exposed strains on the original design of the Internetarchitecture, the two most urgent of which were routing sys-tem scalability and the exhaustion of IP address space.Because long-term solutions require a long lead time to devel-op and deploy, efforts began to develop both a short term anda long-term solution to those problems.

Classless inter-domain routing, or CIDR, was proposed as ashort term solution. CIDR removed the class boundariesembedded in the IP address structure, thus enabling moreefficient address allocation, which helped extend the lifetimeof IP address space. CIDR also facilitated routing aggrega-tion, which slowed down the growth of the routing table size.However, as stated in RFC 1481 [12], IAB Recommendationfor an Intermediate Strategy to Address the Issue of Scaling:“This strategy (CIDR) presumes that a suitable long-termsolution is being addressed within the Internet technical com-munity.” Indeed, a number of new IETF working groups start-ed in late 1992 and aimed at developing a new IP as along-term solution; the Internet Engineering Steering Group(IESG) set up a new IPng area in 1993 to coordinate theefforts, and the IPng Working Group (later renamed to IPv6)was established in the fall of 1994 to develop a new version ofIP [13].

CIDR was rolled out quickly, which effectively slowed thegrowth of the global Internet routing table. Because it is aquick fix, CIDR did not address emerging issues in routingscalability, in particular the issue of site multihoming. A multi-homed site should be reachable through any of its multipleprovider networks. In the existing routing architecture, thisrequirement translates to having the prefix, or prefixes, of thesite listed in the global routing table, thereby renderingprovider-based prefix aggregation ineffective. Interested read-ers are referred to [14] for a more detailed description onmultihoming and its impact on routing scalability.

The new IP development effort, on the other hand, tookmuch longer than anyone expected when the effort first

ZHANG LAYOUT 9/5/08 1:03 PM Page 9

Page 10: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200810

began. The IPv6 working group finally completed all of theprotocol development effort in 2007, 13 years after its estab-lishment. The IPv6 deployment also is slow in coming. Untilrecently, there were relatively few IPv6 trial deployments;there is no known commercial user site that uses IPv6 as theprimary protocol for its Internet connectivity.

If one day someone writes an Internet protocol develop-ment history, it would be very interesting to look back andunderstand the major reasons for the slow development andadoption of IPv6. But even without doing any research, onecould say with confidence that NATs played a major role inmeeting the IP address requirement that arose out of theInternet growth and at least deferred the demand for a newIP to provide the much needed address space to enable thecontinued growth of the Internet.

The Unplanned RealityAlthough largely unexpected, NATs have played a majorrole in facilitating the explosive growth of Internet access.Nowadays, it is common to see multiple computers, or evenmultiple LANs, in a single home. It would be unthinkablefor every home to obtain an IP address block, however smallit may be, from its network service provider. Instead, a com-mon implementation for home networking is to install aNAT box that connects one home network or multiple homenetworks to a local provider. Similarly, most enterprise net-works deploy NATs as well. It also is well known that coun-tries with large populations, such as India and China, havemost of their hosts behind NAT boxes; the same is true forcountries that connected to the Internet only recently. With-out NATs, the IPv4 address space would have been exhaust-ed a long time ago.

For reasons discussed later, the IETF did not standardizeNAT implementation or operations. However, despite thelack of standards, NATs were implemented by multiple ven-dors, and the deployment spread like wildfire. This is becauseNATs have several attractions, as we describe next.

Why NATs SucceededNATs started as a short term solution while waiting for a newIP to be developed as the long-term solution. The first recog-nized NAT advantages were stated in RFC 1918 [1]:

With the described scheme many large enterprises will needonly a relatively small block of addresses from the globallyunique IP address space. The Internet at large benefits throughconservation of globally unique address space, which will effec-tively lengthen the lifetime of the IP address space. The enterpris-es benefit from the increased flexibility provided by a relativelylarge private address space.

The last point deserves special emphasis. Indeed, anyonecan use a large block of private IP addresses — up to 16 mil-lion without asking for permission — and then connect to therest of the Internet by using only a single public IP address. Abig block of private IP addresses provides the much neededroom for future growth. On the other hand, for most if not alluser sites, it is often difficult to obtain an IP address blockthat is beyond their immediate requirements.

Today, NAT is believed to offer advantages well beyondthe above. Essentially, the mapping table of a NAT providesone level of indirection between hosts behind the NAT andthe global Internet. As the popular saying goes, “Any problemin computer science can be solved with another layer of indi-rection.” This one level of indirection means that one neverneed worry about renumbering the internal network when

changing providers, other than renumbering the public IPaddress of the NAT box.

Similarly, a NAT box also makes multihoming easy. OneNAT box can be connected to multiple providers and use oneIP address from each provider. Not only does the NAT boxshelter the connectivity to multiple ISPs from all the internalhosts, but it also does not require any of its providers to“punch a hole” in the routing announcement (i.e., make anISP de-aggregate its address block). Such a hole punch wouldbe required if the multihomed site takes an IP address blockfrom one of its providers and asks the other providers toannounce the prefix.

Furthermore, this one level of indirection also is perceivedas one level of protection because external hosts cannotdirectly initiate communication with hosts behind a NAT, norcan they easily figure out the internal topology.

Besides all of the above, two additional factors also con-tributed greatly to the quick adoption of NATs. First, NATscan be unilaterally deployed by any end site without any coor-dination by anybody else. Second, the major gains fromdeploying a NAT were realized on day one, whereas its poten-tial drawbacks were revealed only slowly and recently.

The Other Side of the NATA NAT disallows the hosts behind it from being reachable byan external host and hence disables it from being a server.However, in the early days of NAT deployment, many peoplebelieved that they would have no need to run servers behind aNAT. Thus, this architectural constraint was viewed as a secu-rity feature and believed to have little impact on users or net-work usage. As an example, the following four justificationsfor the use of private addresses are quoted directly from RFC1335 [3].• In most networks, the majority of the traffic is confined to

its local area networks. This is due to the nature of net-working applications and the bandwidth constraints oninter-network links.

• The number of machines that act as Internet servers, that is,run programs waiting to be called by machines in other net-works, is often limited and certainly much smaller than thetotal number of machines.

• There are an increasingly large number of personalmachines entering the Internet. The use of these machinesis primarily limited to their local environment. They alsocan be used as clients such as ftp and telnet to access othermachines.

• For security reasons, many large organizations, such asbanks, government departments, military institutions, andsome companies, allow only a very limited number of theirmachines to have access to the global Internet. The majori-ty of their machines are purely for internal use.As time goes on, however, the above reasoning has largely

been proven wrong.First, network bandwidth is no longer a fundamental con-

straint today. On the other hand, voice over IP (VoIP) hasbecome a popular application over the past few years. VoIPchanged the communication paradigm from client-server to apeer-to-peer model, meaning that any host may call any otherhost. Given the large number of Internet hosts that arebehind NAT, several NAT traversal solutions have beendeveloped to support VoIP. A number of other recent peer-to-peer applications, such as BitTorrent, also have becomepopular recently, and each must develop its own NAT traver-sal solutions.

In addition to the change of application patterns, a fewother problems also arise due to the use of non-unique, pri-

ZHANG LAYOUT 9/5/08 1:03 PM Page 10

Page 11: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 11

vate IP addresses with NATs. For instance, a number of busi-ness acquisitions and mergers have run into situations wheretwo networks behind NATs were required to be interconnect-ed, but unfortunately, they were running on the same privateaddress block, resulting in address conflicts. Yet anotherproblem emerged more recently. The largest allocated privateaddress block is 10.0.0.0/8, commonly referred to as net-10.The business growth of some provider and enterprise net-works is leading to, or already has resulted in, the net-10address exhaustion. An open question facing these networks iswhat to do next. One provider network migrated to IPv6; anumber of others simply decided on their own to use anotherunallocated IP address block [15].

It is also a common misperception that a NAT box makesan effective firewall. This may be due partly to the fact that inplaces where NAT is deployed, the firewall function often isimplemented in the NAT box. A NAT box alone, however,does not make an effective firewall, as evidenced by the factthat numerous home computers behind NAT boxes have beencompromised and have been used as launch pads for spam ordistributed denial of service (DDoS) attacks. Firewalls estab-lish control policies on both incoming and outgoing packets tominimize the chances of internal computers being compro-mised or abused. Making a firewall serve as a NAT box doesnot make it more effective in fencing off malicious attacks;good control polices do.

Why the Opportunity of Standardizing NATWas MissedDuring the decade following the deployment of NATs, a bigdebate arose in the IETF community regarding whether NATshould, or should not, be deployed. Due to its use of privateaddresses, NAT moved away from the basic IP model of pro-viding end-to-end reachability between hosts, thus represent-ing a fundamental departure from the original Internetarchitecture. This debate went on for years. As late as 2000,messages posted to the IETF mailing list by individual mem-bers still argued that NAT was architecturally unsound andthat the IETF should in no way endorse its use or develop-ment. Such a position was shared by many people during thattime.

These days most people would accept the position that theIETF should have standardized NAT early on. How did wemiss the opportunity? A simple answer could be that the crys-tal ball was cloudy. I believe that a little digging would reveala better understanding of the factors that clouded our eyes atthe time. As I see it from my personal viewpoint, the follow-ing factors played a major role.

First, the feasibility of designing and deploying a brand newIP was misjudged, as were the time and effort required forsuch an undertaking. Those who were opposed to standardiz-ing NAT had hoped to develop a new IP in time to meet theneeds of a growing Internet. Unfortunately, the calculationwas way off. While the development of a new IP was taking itstime, Internet growth did not wait. Network address transla-tion is simply an inevitable consequence that was not clearlyrecognized at the time.

Second, the community faced a difficult question regardinghow strictly one should stick to architectural principles, andwhat can be acceptable engineering trade-offs. Architecturalprinciples are guidelines for problem solving; they help guideus toward developing better overall solutions. However, whenthe direct end-to-end reachability model was interpreted as anabsolute rule, it ruled out network address translation as afeasible means to meet the instant high demand for IP

addresses at the time. Furthermore, sticking to the architec-tural model in an absolute way also contributed to the one-sided view of the drawbacks of NATs, hence the lack of a fullappreciation of the advantages of NATs as we discussed earli-er, let alone any effort to develop a NAT-traversal solutionthat can minimize the impact of NATs on end-to-end reacha-bility.

Yet another factor was that given that network addresstranslation could be deployed unilaterally by a single partyalone, there was not an apparent need for standardization.This seemingly valid reasoning missed an important fact: aNAT box does not stand alone; rather it interacts both direct-ly with surrounding IP devices, as well as indirectly withremote devices through IP packet handling. The need forstandardizing network address translation behavior has sincebeen well recognized, and a great effort has been devoted todeveloping NAT standards in recent years [16].

Unfortunately the early misjudgment on NAT already hascost us dearly. While the big debate went on through the late1990s and early part of the first decade of this century, NATdeployment was widely rolled out, and the absence of a stan-dard led to a number of different behaviors among variousNAT products. A number of new Internet protocols also weredeveloped or finalized during the same time period, such asIPSec, Session Announcement Protocol (SAP), and SessionInitiation Protocol (SIP), to name a few. Their designs werebased on the original model of IP architecture, wherein IPaddresses are assumed to be globally unique and globallyreachable. When those protocols became ready for deploy-ment, they faced a world that was mismatched with theirdesign. Not only were they required to solve the NAT traver-sal problem, but the solutions also were required to deal witha wide variety of NAT box behaviors.

Although NAT is accepted as a reality today, the lessons tolearn from the past are yet to be clarified. One example is therecent debate over Class-E address block usage [17]. Class-Erefers to the IP address block 240.0.0.0/4 that has been onreserve until now. As such, many existing router and hostimplementations block the use of Class-E addresses. Puttingaside the issue of required router and host changes to enableClass-E usage, the fundamental debate has been aboutwhether this Class-E address block should go into the publicaddress allocation pool or into the collection of privateaddress allocations. The latter would give those networks thatface net-10 exhaustion a much bigger private address block touse. However, this gain is also one of the main argumentsagainst it, as the size limitation of private addresses is consid-ered a pressure to push those networks facing the limitationto migrate to IPv6, instead of staying with NAT. Such a desiresounds familiar; similar arguments were used against NATstandardization in the past. However if the past is any indica-tion of the future, we know that pressures do not dictate newprotocol deployment; rather, economical feasibility does. Thisstatement does not imply that migrating to IPv6 brings noeconomical feasibility. On the contrary, it does, especially inthe long run. New efforts are being organized both in protocoland tools development to smooth and ease the transition fromIPv4 to IPv6 and in case studies and documentation to showclearly the short- and long-term gains from deploying IPv6.

Looking Back and Looking ForwardThe IPv4 address space exhaustion predicted long ago is final-ly upon us today, yet the IPv6 deployment is barely visible onthe horizon. What can and should be done now to enable theInternet to grow along the best path forward? I hope thisreview of NAT history helps shed some light on the answer.

ZHANG LAYOUT 9/5/08 1:03 PM Page 11

Page 12: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200812

First, we should recognize not only the fact that IPv4 net-work address translation is widely deployed today, but alsorecognize its perceived benefits to end users as we discussedin a previous section. We should have a full appraisal of thepros and cons of NAT boxes; the discussion in this articlemerely serves as a starting point.

Second, it is likely that some forms of network addresstranslation boxes will be with us forever. Hopefully, a fullappraisal of the pros and cons of network address translationwould help correct the view that all network address transla-tion approaches are a “bad thing” and must be avoided at allcosts. Several years ago, an IPv4 to IPv6 transition schemecalled Network Address Translation-Protocol Translation(NAT-PT; see [18]) was developed but later classified to his-torical status,1 mainly due to the concerns that:• NAT-PT works in much the same way as an IPv4 NAT box.• NAT-PT does not handle all the transition cases.However, in view of IPv4 NAT history, it seems worthwhile torevisit that decision. IPv4, together with IPv4 NAT, will bewith us for years to come. NAT-PT seems to offer a uniquevalue in bridging IPv4-only hosts and applications with IPv6-enabled hosts and networks. There also have been discussionsof the desire to perform address translations between IPv6networks as a means to achieve several goals, including insu-lating one’s internal network from the outside. This questionof “Whither IPv6 NAT?” deserves further attention. Insteadof repeating the mistakes with IPv4 NAT, the Internet wouldbe better off with well-engineered standards and operationalguidelines for traversing IPv4 and IPv6 NATs that aim atmaximizing interoperability.

Furthermore, accepting the existence of network addresstranslation in today’s architecture does not mean we simplytake the existing NAT traversal solutions as given. Instead, weshould fully explore the NAT traversal design space to steerthe solution development toward restoring the end-to-endreachability model in the original Internet architecture. A neweffort in this direction is the NAT traversal through tunneling(NATTT) project [19]. Contrary to most existing NAT traver-sal solutions that are server-based or protocol-specific,NATTT aims to restore end-to-end reachability among Inter-net hosts in the presence of NATs, by providing generic,incrementally deployable NAT-traversal support for all appli-cations and protocols.

Last, but not least, I believe it is important to understandthat successful network architectures can and should changeover time. All new systems start small. Once successful, theygrow larger, often by multiple orders of magnitude as is thecase of the Internet. Such growth brings the system to anentirely new environment that the original designers may nothave envisioned, together with a new set of requirements thatmust be met, hence the necessity for architectural adjust-ments.

To properly adjust a successful architecture, we must havea full understanding of the key building blocks of the architec-ture, as well as the potential impact of any changes to them. Ibelieve the IP address is this kind of key building block thattouches, directly or indirectly, all other major components inthe Internet architecture. The impact of IPv4 NAT, whichchanged IP address semantics, provides ample evidence. Dur-ing IPv6 development, much of the effort also involved achange in IP address semantics, such as the introduction ofnew concepts like that of the site-local address. The site-localaddress was later abolished and partially replaced by unique

local IPv6 unicast addresses (ULA) [20], another new type ofIP address. The debate over the exact meaning of ULA is stillgoing on.

The original IP design clearly defined an IP address asbeing globally unique and globally reachable and as identify-ing an attachment point to the Internet. As the Internet con-tinues to grow and evolve, recent years have witnessed analmost universal deployment of middleboxes of varioustypes. NATs and firewalls are dominant among deployedmiddleboxes, though we also are seeing increasing numbersof SIP proxies and other proxies to enable peer-to-peer-based applications. At the same time, proposals to changethe original IP address definition, or even redefine it entire-ly, continue to arise. What should be the definition, or defi-nitions, of an IP address today, especially in the face ofvarious middleboxes? I believe an overall examination of therole of the IP address in today’s changing architecturedeserves special attention at this critical time in the growthof the Internet.

AcknowledgmentsI sincerely thank Mirjam Kuhne and Wendy Rickard for theirhelp with an earlier version of this article that was posted inthe online IETF Journal of October 2007. I also thank the co-editors and reviewers of this special issue for their invaluablecomments.

References[1] Y. Rekhter et al., “Address Allocation for Private Internets,” RFC 1918, 1996.[2] D. Clark et al., “Towards the Future Internet Architecture,” RFC 1287, 1991.[3] Z. Wang and J. Crowcroft, “A Two-Tier Address Structure for the Internet: A

Solution to the Problem of Address Space Exhaustion,” RFC 1335, 1992.[4] J. Rosenberg et al., “STUN: Simple Traversal of User Datagram Protocol

(UDP) through Network Address Translators (NATs),” RFC 3489, 2003.[5] J. Rosenberg, R. Mahy, and P. Matthews, “Traversal Using Relays around

NAT (TURN),” draft-ietf-behave-turn-08, 2008.[6] C. Huitema, “Teredo: Tunneling IPv6 over UDP through Network Address

Translations (NATs),” RFC 4380, 2006.[7] S. Kent and R. Atkinson, “Security Architecture for the Internet Protocol, RFC

2401, 1998.[8] J. Postel and J. Reynolds, File Transfer Protocol (FTP), RFC 959, 1985.[9] J. Postel, Internet Protocol Specification, RFC 791, 1981.[10] P. Tsuchiya and T. Eng, “Extending the IP Internet through Address Reuse,”

ACM SIGCOMM Computer Commun. Review, Sept. 1993.[11] K. Egevang and P. Francis, “The IP Network Address Translator (NAT),”

RFC 1631, 1994.[12] C. Huitema, “IAB Recommendation for an Intermediate Strategy to Address

the Issue of Scaling,” RFC 1481, 1993.[13] R. M. Hinden, “IP Next Generation Overview,” http://playground.sun.com/

ipv6/INET-IPng-Paper.html, 1995.[14] L. Zhang, “An Overview of Multihoming and Open Issues in GSE,” IETF J.,

Sept. 2006.[15] L. Vegoda, “Used but Unallocated: Potentially Awkward /8 Assignments,”

Internet Protocol J., Sept. 2007.[16] http://www.ietf.org/html.charters/behave-charter.html; IETF BEHAVE Work-

ing Group develops requirements documents and best current practices toenable NATs to function in a deterministic way, as well as advises on how todevelop applications that discover and reliably function in environments withthe presence of NATs.

[17] http://www.ietf.org/mail-archive/web/int-area/current/msg01299.html;see the message dated 12/5/07 with subject line “240/4” and all the fol-low-up.

[18] G. Tsirtsis and P. Srisuresh, “Network Address Translation-Protocol Transla-tion (NAT-PT),” RFC 2766, 2000.

[19] E. Osterweil et al., “NAT Traversal through Tunneling (NATTT),” http://www.cs.arizona.edu/˜bzhang/nat/

[20] R. M. Hinden and B. Haberman, “Unique Local IPv6 Unicast Addresses,”RFC 4193, 2005.

BiographyLIXIA ZHANG ([email protected]) received her Ph.D. in computer science from theMassachusetts Institute of Technology. She was a member of research staff at theXerox Palo Alto Research Center before joining the faculty of the UCLA Comput-er Science Department in 1995. In the past she served as vice chair of ACMSIGCOMM and co-chair of the IEEE ComSoC Internet Technical Committee. Sheis currently serving on the Internet Architecture Board.

1 Historical status means that a protocol is considered obsolete and is thusremoved from the Internet standard protocol set.

ZHANG LAYOUT 9/5/08 1:03 PM Page 12

Page 13: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

hen the Internet Protocol (IP) was designed,the growth of the Internet to its current sizewas not imaginable. Therefore, it was reason-able to use a fixed 32-bit field to identify a

host based on its IP address. This limited address range makesit impossible to assign globally unique IPv4 addresses to thegrowing number of networked devices. Furthermore, request-ing an IP address for every newly added device results in anunacceptable administration overhead. The authors in [1] pro-pose to assign a number of public IP addresses to a designat-ed border router instead of configuring certain hosts withaddresses that can be routed globally. The border router isthen responsible for translating IP addresses between the pri-vate and the public domains, allowing as many simultaneousconnections as public IP addresses were assigned. This allowsa host within the local network to access the Internet eventhough it has a private IP address. This technique becameknown as network address translation (NAT). Because thetranslation of addresses breaks the end-to-end connectivitymodel of the IP, newly developed services following the peer-to-peer (P2P) paradigm such as file sharing, instant messag-ing, and voice over IP (VoIP) applications suffer from theexistence of NAT. Thus, NAT traversal is an important prob-lem today. And even in the future, after a possible success ofIPv6, companies and home users still might deploy NATdevices to hide their topologies from Internet service providers(ISPs). There are two possible approaches to the problem.One direction within the Internet Engineering Task Force(IETF) Behave Working Group [2] is to cope with existingNAT implementations and to establish standards for the

detection of NAT behavior and for NAT traversal. On theother hand, the IETF also standardizes behavioral propertiesfor NATs to work in conjunction with IETF protocols (e.g.,Datagram Congestion Control Protocol [DCCP], InternetControl Message Protocol [ICMP], Stream Control Transmis-sion Protocol [SCTP]). Enterprise class NATs are among thefirst to incorporate new features introduced through standard-ization. However, the large scale deployment of residentialgateways with NAT functionality prohibits the change of NATand requires the use of protocols that work with existingNATs. This is also the focus of this article, where we treatNATs as black boxes rather than trying to change them.

NAT BehaviorToday, a NAT device usually is used to share a single publicIP address among a number of private end systems. The NATmaintains a table, listing all connections between the publicand the private domains. For every connection attempt (e.g., aTransmission Control Protocol synchronize [TCP SYN] pack-et) coming from an internal host, the NAT creates a newentry in the list. In NAT terminology this entry is called abinding [3]. Each entry contains the source IP address and thesource port. The NAT replaces the source IP address with itspublic IP address. The source port is replaced using one ofthe strategies explained later in this section.

Although the concept of NAT was published as early as1994 [1], no common approach for NAT emerged. CurrentNAT implementations not only differ from vendor to vendorbut also from model to model, which leads to compatibility

IEEE Network • September/October 200814 0890-8044/08/$25.00 © 2008 IEEE

WW

Andreas Müller and Georg Carle, Technische Universität MünchenAndreas Klenk, Universität Tübingen

AbstractFor a long time, traditional client-server communication was the predominant com-munication paradigm of the Internet. Network address translation devices emergedto help with the limited availability of IP addresses and were designed with thehypothesis of asymmetric connection establishment in mind. But with the growingsuccess of peer-to-peer applications, this assumption is no longer true. Consequent-ly network address translation traversal became a field of intensive research andstandardization for enabling efficient operation of new services. This article pro-vides a comprehensive overview of NAT and introduces established NAT traversaltechniques. A new categorization of applications into four NAT traversal servicecategories helps to determine applicable techniques for NAT traversal. The interac-tive connectivity establishment framework is categorized, and a new framework isintroduced that addresses scenarios that are not supported by ICE. Current resultsfrom a field test on NAT behavior and the success ratio of NAT traversal tech-niques support the feasibility of this classification.

Behavior and Classification of NATDevices and Implications for NAT Traversal

MUELLER LAYOUT 9/5/08 1:01 PM Page 14

Page 14: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 15

issues. If an application works with one particular NAT, thisdoes not imply that it always works in a NATed environment.Therefore, it is very important to understand and classifyexisting NAT implementations in order to design applicationsthat can work in combination with current NATs. The classifi-cation in this article is mainly derived from simple traversal ofUser Datagram Protocol (UDP) through NAT (STUN) [4],whereas the address binding and mapping behavior followsthe terminology used in RFC 4787 [5]. This section coversonly topics that are required for the understanding of thisarticle. A detailed discussion and further information (includ-ing test results) is given in [6] (for TCP) and [5] (for UDP).

Binding covers “context based packet translation” [7], whichdescribes the strategy the NAT uses to assign a public trans-port address (combination of IP address and port) to a newstate in the NAT. Filtering, or packet discard, shows how theNAT handles (or discards) packets trying to use an existingmapping. Table 1 shows the different categories and their pos-sible properties. Port binding describes the strategy a NATuses for the assignment. With port preservation, the NATassigns an external port to a new connection; it attempts topreserve the local port number if possible. Port overloading isproblematic and rarely occurs. A new connection takes overthe binding, and the old connection is dropped. Port multi-plexing is a very common strategy where ports are demulti-plexed based on the destination transport address. Incomingpackets can now carry the same destination port and are dis-tinguished by the source transport address.

NAT binding deals with the reuse of existing bindings. Thatis, if an internal host closes a connection and establishes anew one from the same source port, NAT binding describesthe assignment strategy for the new connection. As shown inTable 1, the NAT binding is organized into three categories.With Endpoint Independent, the external port is only depen-dent on the source transport address of the connection. Aslong as a host establishes a connection from the same sourceIP address and port, the mapping does not change. Theassignment is dependent on the internal and the externaltransport address with the Address (Port) Dependent strategy.As long as consecutive connections from the same source tothe same destination are established, the mapping does notchange. As soon as we use a different destination, the NATchanges the external port. With a Connection Dependent bind-ing, the NAT assigns a new port to every connection. We dis-tinguish between NATs that increase the new port number bya specific (and well predictable) delta and NATs that assignrandom port numbers to the new mappings.

Endpoint filtering describes how existing mappings can beused by external hosts and how a NAT handles incoming con-nection attempts that are not part of a response. IndependentFiltering allows inbound connections independent of the

source transport address of the packet. As long as the destina-tion transport address of a packet matches an existing state,the packet is forwarded. With Address Restricted Filtering, theNAT forwards only packets coming from the same host(matching IP address) to which the initial packet was sent.Address and Port Restricted Filtering also compares the sourceport of the inbound packet in addition to address restrictedfiltering.

NAT Traversal ProblemTo work properly, the NAT must have access to the protocolheaders at layers 3 and 4 (in case of a network address porttranslation [NAPT]). Additionally, for every incoming packet,the NAT must already have a state listed in its table. Other-wise, it cannot find the related internal host to which thepacket belongs. According to RFC 3027 [8], the NAT traver-sal problem can be separated into three categories, which arepresented in this section. In addition to the three problems,we identified Unsupported Protocols as a new category.

The first problem occurs if a protocol uses Realm-SpecificIP Addresses in its payload. That is, if an application layer pro-tocol such as the Session Initiation Protocol (SIP) uses atransport address from the private realm within its payloadsignalizing where it expects a response. Because regular NATsdo not operate above layer 4, application layer protocols typi-cally fail in such scenarios. A possible solution is the use of anapplication layer gateway (ALG) that extends the functionali-ty of a NAT for specific protocols. However, an ALG sup-ports only the application layer protocols that are specificallyimplemented and may fail when encryption is used.

The second category is P2P Applications. The traditionalInternet consists of servers located in the public realm andclients that actively establish connections to these servers.This structure is well suited for NATs because for every con-nection attempt (e.g., a TCP SYN) coming from an internalclient, the NAT can add a mapping to its table. But unlikeclient-server applications, a P2P connection can be initiatedby any of the peers regardless of their location. However, if apeer in the private realm tries to act as a traditional server(e.g., listening for a connection on a socket), the NAT isunaware of incoming connections and drops all packets. Asolution could be that the peer located in the private domainalways establishes the connection. But what if two peers, bothbehind a NAT, want to establish a connection to each other?Even if the security policy would allow the connection, it can-not be established.

The third category is a combination of the first two. Bun-dled Session Applications, such as File Transfer Protocol(FTP) or SIP/Session Description Protocol (SDP), carryrealm-specific IP addresses in their payload to establish anadditional session. The first session is usually referred to asthe control session, whereas the newly created session iscalled the data session. The problem here is not only therealm-specific IP addresses, but the fact that the data sessionoften is established from the public Internet toward the pri-vate host, a direction the NAT does not permit (e.g., activeFTP).

Unsupported Protocols are typically newly developed trans-port protocols such as the SCTP or the DCCP that causeproblems with NATs even if an internal host initiates the con-nection establishment. This is because current NATs do nothave built-in support for these protocols. The unsupportedprotocols also cover protocols that cannot work with NATsbecause their layer 3 or layer 4 header is not available fortranslation. This happens when using encryption protocolssuch as IPSec.

n Table 1. NAT behavior categories and possible NAT properties.

Classification NAT property

Port binding

Port preservationNo port preservationPort overloadingPort multiplexing

NAT bindingEndpoint-independentAddress- (port)-dependentConnection-dependent

Endpoint filteringIndependentAddress restrictedAddress and port restricted

MUELLER LAYOUT 9/5/08 1:01 PM Page 15

Page 15: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200816

NAT Traversal Service CategoriesInstead of classifying the NAT behavior (see classification inSTUN [4]), we defined four NAT traversal service categories,each making different assumptions about the purpose of theconnection establishment and the infrastructure that is avail-able. Our categorization emphasizes that the applicability ofmany NAT traversal techniques depends on the support of acombination of requester, the responder, globally reachableinfrastructure nodes, and the role of the application. On theone hand, server applications set up a socket and wait for con-nections (which also applies to P2P applications). On theother hand, client applications such as VoIP clients activelyinitiate a connection and wait for an answer on a differentport (bundled session applications). Other applications workonly across NATs if both ends participate in the connectionestablishment (unsupported protocols). Thus, we differentiatebetween supporting a service and supporting a client. In thisarticle, the client is called the requester because it actively ini-tiates a connection.

The behavior of the NAT is important because it allows orprohibits certain NAT traversal techniques within one servicecategory. If only one end implements NAT traversal support(e.g., by running a stand-alone framework or by built-in NATtraversal functionality), NAT traversal techniques that rely ona collaboration of both ends (e.g., ICE) are not applicable.

Our first category, requester side NAT traversal (RNT),covers scenarios where only the requester side supportsNAT traversal (e.g., the application or the NAT itself).RNT helps applications that actively participate in the con-nection establishment and still suffer from the existence ofNATs. Typical examples are applications that have prob-lems with realm-specific IP addresses in their payload. Thisapplies to protocols using in-band signaling on the applica-tion layer, which is related to bundled session applicationswith asymmetric connection establishment (e.g., VoIP usingSIP/SDP).

The second category, global service provisioning (GSP),assumes that the host providing the service implements NATtraversal support, helping to make a service globally accessi-ble. This is done by creating and maintaining a NAT mappingthat then accepts multiple connections from previouslyunknown clients (Fig. 1). This is the main difference fromRNT, which only creates a NAT mapping for one particularsession (e.g., one call in the case of VoIP).

The last two categories assume support at both ends, theservice and the requester. On the one side, NAT traversal isrequired to make a service behind a NAT globally accessible,whereas on the other side, the support at the requester allowsthe use of sophisticated techniques through coordinatedaction. Thus, service provisioning using pre-signaling (SPPS)extends the GSP category by the assumption that both hostshave interoperable frameworks (e.g., ICE [9]; NAT, URIs,Tunnels, SIP, and STUNT [NUTSS] [10]; NATBlaster [11]; orNatTrav [12]) running. This allows a selection from all avail-able NAT traversal solutions, which leads to a high successrate of NAT traversal. In Fig. 1, the two hosts use a ren-dezvous point to agree on a NAT traversal technique. After

creating the mapping in step 2, the service is accessible by anyhost, depending on the selected NAT traversal technique andthe filtering strategy of the NAT. SPPS supports all types ofservices where a one-to-one connection is sufficient and pre-signaling is available.

The last category, secure service provisioning (SSP), is anextension of SPPS and addresses scenarios that require autho-rization of the remote party before initiating the NAT traver-sal process. The hereby established channel must be accessibleonly by the authorized remote party. This requires additionalfunctionality that enforces this policy and only allows autho-rized users to access the service. The policy enforcement canbe done at the NAT itself, at a data relay, or at a firewall.Table II depicts all four service categories with popular NATtraversal techniques and shows the implications for automatedNAT traversal and required signaling. First we distinguishbetween the service and the requester. “Support at the ser-vice” means, for example, that a framework must be deployedat the same host providing the service. The same applies tothe requester. “RP” means that a rendezvous point is requiredfor relaying data back and forth. “Signaling messages” meansthat some sort of signaling protocol is used for NAT traversal.Again, we differentiate between signaling at the service andsignaling at the requester. A rendezvous point for signalingmessages is required in case of pre-signaling. Finally, “streamindependent” describes the requirement for consecutive con-nections. For example, a port forwarding entry must be creat-ed only once, whereas hole punching [13] requires sending anew hole punching packet for every new stream (with restrict-ed filtering).

Table 2 shows the main differences of our service cate-gories. RNT deals with bundled session applications that waiton a port after initiating a session (e.g., via a SIP INVITE).GSP requires only support of the service and aims to make aservice globally reachable for multiple clients. SPPS and SSPcombine these categories and require support at both ends.The requester initiates pre-signaling to exchange informationabout a global end point. The service then creates a mappingin the NAT that can be used by the client.

Applicability of NAT Traversal Techniques forNAT Traversal Service CategoriesThere are many different techniques for solving the NATtraversal problem in specific scenarios, but none of them pro-vides a solution that works well with all NATs, applications,and network topologies. Another article explains many of theavailable protocols for NAT traversal [14] in general. This sec-tion describes the applicability of existing techniques from theapplications point of view.

RNT is required for protocols using in-band signaling (bun-dled session applications). Therefore, one common approachis to integrate RNT into these applications (e.g., the VoIPclient), to establish port bindings on the fly. One possibility isthe integration of a universal plug and play (UPnP) client.Another option is to use ALGs that are integrated in theNAT, interpreting in-band signaling and establishing map-

n Figure 1. NAT traversal service categories for applications: a) RNT; b) GSP; c) SPPS; d) SSP.

(3)Requester Service Service

a) b) c) d)

(1)

(2)(3)

(2)(2)

(1) (1)(2) (3)

(3)

(1)(1)

MUELLER LAYOUT 9/5/08 1:01 PM Page 16

Page 16: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 17

pings accordingly. ALGs are not a general solution becausethe NAT must implement the required logic for each proto-col, and end-to-end security prohibits the interpretation of thesignaling by the NAT.

GSP depends on NAT traversal techniques that allow unre-stricted access to a public end point. A control protocol canbe used to directly establish a port forwarding entry in themapping tables of the NAT, for instance, with UPnP [15].Port forwarding entries created by UPnP are easy to maintainand work independently from NAT behavior. However, UPnPonly works if the NAT is in the local network on the path tothe other end point. Thus, nested NATs are not allowed, andpath changes break the connectivity.

Hole punching is an alternative if UPnP is not applicableand works for NATs with an independent filtering strategy.The mapping must be refreshed periodically, for instance, bysending keep-alive packets. For NATs other than full-cone,hole-punching for GSP cannot be used because the sourceport of the request is unknown in advance.

SSPS makes no assumption about the accessibility of a cre-ated mapping, thus all possible techniques are applicable. Dif-ferent from GSP, hole-punching for SPPS works as long asport prediction is possible. For NATs implementing restrictedfiltering, pre-signaling helps to create the appropriate map-ping because the five-tuple of the connection is exchanged.Pre-signaling also enables the establishment of an UDP tun-nel, allowing the encapsulation of unsupported protocols.SPPS also can use UPnP to establish port forwarding entriesfor one session.

SSP is an extension to SPPS that allows only authorizedhosts to allocate and to use a mapping. Protocols that autho-rize requests and assume control over the middlebox, such asmiddlebox communication (MIDCOM) [16] or the NAT/Fire-wall Next Step in Signaling (NSIS) Layer Protocol [17] qualifyfor SSP. The advantage of NSIS is that it can discover andconfigure multiple middleboxes along the data path, thus sup-porting complex scenarios with nested NATs and multipathrouting. However, if one NAT on the path does not supportthe protocol, NSIS fails. Using NSIS and MIDCOM for SSPrequires restrictive rules that allow only authorized clients touse the mapping, for instance, by opening pinholes for IP five-tuples. UPnP is not useful for SSP because it forwardsinbound packets without considering the source transportaddress. Hole punching can be used only with SSP if the NATimplements a restricted filtering strategy. All cases discussedpreviously rely on additional measures to prohibit IP spoofing.The use of secure tunnels impedes IP spoofing and allowssecure NAT traversal, even for unsupported protocols (e.g.,IPSec, SCTP, DCCP). SSP also can be achieved by usingtraversal using relay NAT (TURN) with authentication,authorization, and secure communication (e.g., via transportlayer security [TLS]).

ICE [9] is under standardization by the IETF and strives tocombine several techniques into a framework flexible enoughto work with all network topologies. Because ICE requiresboth peers to have an ICE implementation running, it can beseen as a technique for SPPS or SSP, depending on the acces-sibility and the security policies of the public endpoint.

n Table 2. Service categories and their implications for automated NAT traversal; RP denotes rendezvous point.

Servicecategory NAT traversal techniques

Requires support at Signaling messagesStream-

independentService Requester RP NAT Service Requester RP STUN

RNT

NAT with ALG X

UPnP (for bundled sessionapplications) X X X X

GSP

UPnP (port forwarding) X X X X

Hole punching —independent filtering X X X X

Open data relay (e.g., RSIP) X X X X X

SPPS

Hole punching —independent binding X X X X X X

UPnP X X X X X X

Closed/open data relay(e.g., TURN, Skype) X X X X X X

Tunneling (e.g., over UDP) X X X X X

SSP

Hole punching —restricted filtering X X X X X X

NSIS NATFW NSLP X X X X X

Closed data relay (e.g.,TURN) X X X X X X

Tunneling (e.g., oversecure channel) X X X X X

MUELLER LAYOUT 9/5/08 1:01 PM Page 17

Page 17: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200818

The same is true for solutions such as TURN [18]. TURNis a promising candidate for SPPS, because it provides a relaywith a public transport address allowing the exchange of datapackets between a TURN client and a public host.

Why Unilateral Solutions ExistDespite the great flexibility of SPPS and SSP, both categoriesinvolve a number of assumptions that are not always satisfied.The most important one is the requirement for both ends(and sometimes also the infrastructure), to support compati-ble versions of the NAT traversal framework. It remains to beseen if the future will bring a sufficiently big deployment ofone framework on which to rely for arbitrary applications.The chances are better within homogeneous problem domains,like telecommunication, where such frameworks can be inte-grated with the applications and be distributed in large num-bers. For instance, the adoption of ICE is occurring mainlywithin the VoIP/SIP community and focusing on VoIP specificuse cases. These drawbacks are the reason why RNT and GSPas unilateral solutions for the NAT traversal problems exist. Itis easier to enhance an infrastructure under one responsibilitythan to rely on a solution that requires a global deployment.However, unilateral solutions are limited to the middle-boxes in the given domain. They fail to provide solutionsto scenarios with nested NATs and depend on the net-work topology.

Coalescing Unilateral and CooperativeApproaches for NAT TraversalWhen investigating existing NAT traversal techniques, wedetermined that none of them can be used in all scenar-ios. For example, UPnP only supports globally accessibleend points, whereas ICE requires both hosts to run theframework. In [19], we proposed a new framework thataims toward providing an advanced NAT traversal service(ANTS) supporting all four service categories. The con-cept of ANTS is based on the idea of reusing previouslyobtained knowledge about the topology of the networkand the capability of the NAT. A small component ofANTS, the NAT tester, is responsible for gathering thisinformation and will be presented (together with sometest results) in the next section.

If a user decides that a particular application should bereachable from the public Internet, he registers it at a ses-sion manager that keeps track of all applications request-

ing NAT traversal support. With the session manager, ANTScan provide GSP and RNT directly. Whenever an applicationis added and associated with GSP or RNT, the session manag-er calls the NAT traversal logic and asks to allocate an appro-priate mapping in the NAT. This also requires ANTS to havesufficient knowledge about the applicability of the integratedtechniques regarding the service categories. For example,UPnP cannot be used for SSP because it violates the idea ofan endpoint that is accessible only by authenticated hosts.

Figure 2 shows a decision tree that ANTS uses to establisha mapping in the NAT. First, we distinguish between requesterinitiated NAT traversal on the one hand and the access to aservice on the other hand. Then, we must know which endsactually implement ANTS. If both hosts have the frameworkrunning, pre-signaling is possible, which leads to a wide choiceof techniques depending on the security considerations of themapping. If only one end supports ANTS, only techniquesbelonging to GSP or RNT are applicable.

Despite some unsolved issues such as the question of howto connect legacy applications to ANTS (e.g., by using alibrary or a traversal of UDP through NAT [TUN]-basedapproach), the idea of a knowledge-based framework seems

n Table 3. Results of the field test: success rates of NAT traversal tech-niques depending on service categories.

S. cat. Prot. Condition Suc. rate

RNT UDPTCP

(UPnP or HP-UDP)(UPnP or HP-TCP)

90.27%77.84%

GSP

UDPTCPUDPTCP

(Full Cone and HP-UDP)(Full Cone and HP-TCP)(UPnP or (Full Cone and HP-UDP))(UPnP or (Full Cone and HP-TCP))

27.03%17.30%50.27%44.32%

SPPS

UDPTCPTCPUDPTCPTCP

(HP-UDP)(HP-TCP)(HP-TCP or HP-UDP)(UPnP or HP-UDP)(UPnP or HP-TCP)(UPnP or HP-TCP or HP-UDP)

88.65%71.35%94.59%90.27%77.84%95.14%

SSP UDPTCP

(Restricted NAT and HP-UDP)(Restricted NAT and HP-TCP)

48.65%38.38%

n Figure 2. Decision tree for ANTS.

Support at both ends

SSP

Secureendpoint

Insecureendpoint

SPPS SSP

Secureendpoint

Insecureendpoint

SPPS

Supportat client

RNT

Support at service

GSP

Support at both ends

Requester initiated Access to service

NAT traversal request

MUELLER LAYOUT 9/5/08 1:01 PM Page 18

Page 18: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 19

to be the right answer. Thus once implemented, ANTS canhelp many existing services by integrating several techniquesand making its choice based on knowledge about the NATand the requirements of the application.

Field Test on NAT TraversalTo prove that existing techniques can be adapted to our ser-vice categories, we implemented a NAT tester that acts as acornerstone for our new framework. This section presents theresults of a field test investigating 185 NATs in the wild. For adetailed description including all results, see our Web site:http://nettest.net.in.tum.de.

The first test queries a public STUN server to determinethe type of the NAT. Afterward, the NAT tester performs thefollowing connection tests and tries to establish a connectionto the host behind the NAT: UPnP, hole punching, and con-necting to a data relay (each for both protocols, UDP andTCP) (Table 3).

We then adapted the test results to our work and evaluatedthe success rates of the individual techniques regarding ourdefined service categories. Table III shows the categories andthe conditions that must be met according to the considera-tions made previously. For example, GSP requires the use ofUPnP or hole punching support in combination with a full-cone NAT to make a service globally accessible. Therefore,50.27 percent of our tested NATs supported a direct connec-tion for UDP and category GSP (44.32 percent for TCP). Inall other cases (the remaining percentages), an external relaymust be used to provide GSP.

For SPPS, which makes no security assumptions, we dividedour results into two categories. First we determined the suc-cess rates without considering UPnP. With 88.65 percent of allNATs, we were able to establish a direct connection to thehost behind the NAT (71.35 percent for TCP). This rateincreased slightly (for TCP to 77.84 percent) when UPnP wasan option. The highest success rate for TCP NAT traversal(95.14 percent) was discovered when we also allowed the tun-neling of TCP packets through UDP.

SSP allows only authorized hosts to create and to use amapping. Therefore, a suitable technique for SSP is holepunching in combination with a NAT implementing a restrict-ed filtering strategy. This was supported by 48.65 percent forUDP and 38.38 percent for TCP.

The success rate for RNT depends on the effort that ismade for the specific protocol. For example, if we assume thatwe can inspect each signaling packet on the application layerthoroughly, we could adopt the results from SPPS to RNT. Ifwe would only modify the packets in a way that the internalport is reachable by any client, the success rate of GSP wouldapply to RNT. Finally, we did not measure the effect of NATswith integrated ALGs in this field test.

ConclusionWith the increasing popularity of P2P communication, theNAT traversal problem has become more urgent than ever.Existing solutions have the drawback of supporting only cer-tain types of NATs and cannot be viewed as a general solu-tion to the problem. When analyzing the NAT traversalproblem more thoroughly, we discovered that the question ofwho supports the NAT traversal framework determines whichNAT traversal techniques are applicable. Therefore, we iden-tified four NAT traversal service categories that differentiate

between support by service, client, and infrastructure and list-ed applicable NAT traversal techniques for each category.Our findings from a field test showed that there are a numberof prospective NAT traversal techniques that enable connec-tivity for each NAT traversal service category. We emphasizedhow to build upon this categorization to develop a knowledge-based NAT traversal framework. Future frameworks thataspire to support the typical connectivity scenarios of currentapplications should support all four service categories.

References[1] K. Egevang and P. Francis, “The IP Network Address Translator (NAT),” IETF

RFC 1631, May 1994.[2] IETF, “Behavior Engineering for Hindrance Avoidance (behave);”

http://www.ietf.org[3] P. Srisuresh and M. Holdrege, “IP Network Address Translator (NAT) Termi-

nology and Considerations,” IETF RFC 2663, Aug. 1999.[4] J. Rosenberg et al., “STUN: Simple Traversal of User Datagram Protocol (UDP)

through Network Address Translators (NATs),” IETF RFC 3489, Mar. 2003.[5] E. F. Audet and C. Jennings, “NAT Behavioral Requirements for Unicast

UDP,” IETF RFC 4787, Jan. 2007.[6] S. Guha and P. Francis, “Characterization and Measurement of TCP Traver-

sal through NATs and Firewalls,” Proc. ACM Internet Measurement Conf.,Berkeley, CA, Oct. 2005.

[7] G. Huston, “Anatomy: A Look Inside Network Address Translators,” TheInternet Protocol J., vol. 7, 2004, pp. 2–32.

[8] M. Holdrege and P. Srisuresh, “Protocol Complications with the IP NetworkAddress Translator,” IETF RFC 3027, Jan. 2001.

[9] J. Rosenberg, “Interactive Connectivity Establishment (ICE): A Protocol forNetwork Address Translator (NAT) Traversal for Offer/Answer Protocols,”IETF Internet draft, work in progress, Oct. 2007.

[10] P. Francis, S. Guha, and Y. Takeda, “NUTSS: A SIP-based Approach toUDP and TCP Network Connectivity,” Cornell Univ., Panasonic Commun.,tech. rep., 2004.

[11] A. Biggadike et al., “NATBLASTER: Establishing TCP Connections betweenHosts behind NATs,” ACM SIGCOMM Asia Wksp., Beijing, China, 2005.

[12] J. Eppinger, “TCP Connections for P2P Applications — A SoftwareApproach to Solving the NAT Problem,” Carnegie Mellon Univ., Pittsburgh,PA, tech. rep., 2005.

[13] B. Ford, P. Srisuresh, and D. Kegel, “Peer-to-Peer Communication acrossNetwork Address Translation,” MIT, tech. rep., 2005.

[14] H. Khlifi, J. Gregoire, and J. Phillips, “VoIP and NAT/Firewalls: Issues, TraversalTechniques, and a Real-World Solution,” IEEE Commun. Mag., July 2006.

[15] U. Forum, “Internet Gateway Device (IGD) Standardized Device ControlProtocol,” Nov. 2001.

[16] P. Srisuresh et al., “Middlebox Communication Architecture and Frame-work,” IETF RFC 3303, Aug. 2002.

[17] M. Stiemerling et al., “NAT/Firewall NSIS Signaling Layer Protocol (NSLP),”IETF Internet draft, Feb. 2008.

[18] J. Rosenberg, R. Mahy, and P. Matthews, “Traversal Using Relays aroundNAT (TURN),” IETF Internet draft, work in progress, June 2008.

[19] A. Müller, A. Klenk, and G. Carle, “On the Applicability of Knowledge-Based NAT-Traversal for Future Home Networks,” Proc. IFIP Networking2008, Springer, Singapore, May 2008.

BiographiesANDREAS MÜLLER ([email protected]) received his diploma degree in comput-er science from the University of Tübingen, Germany in 2007. Currently, he is aresearch assistant and Ph.D. candidate at the Network Architecture and ServicesDepartment at the Technical University of Munich. His research interests includemiddleboxes, P2P systems, and autonomic networking.

ANDREAS KLENK ([email protected]) earned his diploma degreein computer science from Ulm University, Germany, in 2003. He is a Ph.D. can-didate and research assistant at the University of Tübingen and works with Pro-fessor Carle. He contributes to European research projects in thetelecommunication field. His research interests include negotiation and security inautonomic systems.

GEORG CARLE ([email protected]) received a M.Sc. degree from Brunel Univer-sity London in 1989, a diploma degree in electrical engineering from the Univer-sity of Stuttgart in 1992, and a doctoral degree from the faculty of computerscience, University of Karlsruhe in 1996. He is a full professor in computer sci-ence at the Technical University of Munich, where he is chair of the Departmentof Network Architecture and Services. Among the focal interests of his researchare Internet technology and mobile communication in combination with security.

MUELLER LAYOUT 9/5/08 1:01 PM Page 19

Page 19: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

AbstractThe lack of a concise and standard language to describe diverse middlebox func-tionality and deployment configurations adversely affects current middlebox deploy-ment, as well as middlebox-related research. To alleviate this problem, we presenta simple middlebox model that succinctly describes how different middleboxes pro-cess packets and illustrate it by representing four common middleboxes. We set upa pilot online repository of middlebox models and prototyped model inference andvalidation tools.

IEEE Network • September/October 200820 0890-8044/08/$25.00 © 2008 IEEE

iddleboxes, like firewalls, NATs, load bal-ancers, and intrusion-prevention boxes havebecome an integral part of networks today.There is great diversity in how these middle-

boxes process and transform packets, and in how they areconfigured and deployed. For example, a firewall is commonlyconnected inline on the physical network path and transpar-ently forwards packets unmodified or drops them. A load bal-ancer, on the other hand, rewrites packet headers and contentsand often requires packets to be explicitly IP addressed andforwarded to it.

There is currently no standard way to succinctly describethe complexity and diversity of middlebox packet processingand deployment mechanisms. Middlebox taxonomies likeRFC 3234 [1] provide only a high-level classification of mid-dleboxes. Details about middlebox operations and deploymentconfigurations often are buried in different middlebox andvendor specific configuration manuals or simply are not docu-mented clearly. Efforts like the Unified Firewall Model [2]and BEHAVE [3] provide models to describe the operationsof specific middleboxes like firewalls and NATs.

The lack of a concise and standard language to describe dif-ferent middleboxes adversely affects current middlebox deploy-ment, as well as hinders middlebox-related research. Correctlydeploying and configuring a middlebox is a challenging task byitself. Without a clear understanding of how different middle-boxes process packets and interact with the network and withother middleboxes, network planning, verification of opera-tional correctness, and troubleshooting become even morecomplicated. In our own research experience of designing andimplementing the policy-aware switching layer [4] — a newmechanism to overhaul the ad hoc manner in which middle-boxes are deployed in data centers today — the non-availabili-ty of clear information about how some middleboxes processpackets led to initial design decisions that were wrong and thatlater manifested as hard-to-debug errors while testing.

In this article, we present a general model to clearly andsuccinctly describe the functionality of a middlebox anddeployment configurations. Through sets of pre-conditions andprocessing rules, the model describes the types of packetsexpected by a middlebox and how it transforms them. Later,we provide more details of our model and illustrate it by rep-resenting four common middleboxes.

The middlebox model provides a standard language to con-cisely describe different middleboxes. We are building an

online repository of middlebox models at http://www.middle-box.org, which we envision as filled with models of variouscommonly used middleboxes. To ease model construction, weprototyped a tool that infers hints about the operations of aparticular middlebox through black box testing. We also pro-totyped a tool that validates the operations of a middleboxagainst its model and thus helps detect unexpected behavior.We discuss these and other applications of our model later.

The ModelRFC 3234 [1] defines a middlebox as “an intermediary deviceperforming functions other than the normal, standard functionsof an IP router on the datagram path between a source host anddestination host.” We refine this high-level definition of a mid-dlebox to construct a simple model that describes various aspectsof middlebox functionality and operations. A middlebox in ourmodel consists of zones, input pre-conditions, state databases, pro-cessing rules, auxiliary traffic, and the interest and state fieldsdeduced from the processing rules. In this section, we describeand illustrate our model using four common middleboxes —firewall, NAT, layer-4 load balancer, and SSL-offload capablelayer-7 load balancer. Table 1 describes the notations used.

Interfaces and ZonesPackets enter and exit a middlebox through one or more of itsphysical network interfaces. Each physical interface belongs toone or more logical network zones. A zone represents a packetentry and exit point from the perspective of middlebox func-tionality. A middlebox processes packets differently based ontheir ingress and egress zones.

For example, the firewall shown in Fig. 1a has two physicalinterfaces, one belonging to the red zone that represents theinsecure external network, and the other belonging to thegreen zone representing the secure internal network. Packetsentering through the red zone are more stringently checkedthan those entering through the green zone. Similarly, theNAT in Fig. 1b has two different physical network interfaces,one belonging to the internal network (zone int) and theother belonging to the external network (zone ext). Thesource IP and port number are rewritten for packets receivedat zone int, whereas the destination IP and port number arerewritten for packets received at zone ext. Figure 1c shows aload balancer with a single physical network interface thatbelongs to two different zones — zone inet representing the

MM

Dilip Joseph and Ion Stoica, University of California at Berkeley

Modeling Middleboxes

JOSEPH LAYOUT 9/5/08 1:04 PM Page 20

Page 20: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 21

Internet and zone srvr representing the Web server farm. Theload balancer spreads out packets received at zone inet toWeb server instances in zone srvr.

We assume that the mapping between interfaces and zonesis pre-determined by the middlebox vendor or configured dur-ing middlebox initialization. Frames reaching an interfacebelonging to multiple zones are distinguished by their virtuallocal area network (VLAN) tags, IP addresses, and/or trans-port port numbers.

Input PreconditionsInput preconditions specify the types of packets that areaccepted by a middlebox for processing. For example, a trans-parent firewall processes all packets received by it, whereas aload balancer in a single-legged configuration processes apacket arriving at its inet zone only if the packet is explicitlyaddressed to it at layers 2, 3, and 4. Similarly, a NAT process-es all packets received at its int zone, but requires thosereceived at its ext zone to be addressed to it at layers 2 and 3.

Input pre-conditions are represented using a clause of theform I (P, p), which is true if the headers and contents of pack-et p match the pattern P. For example, the firewall has the inputprecondition I (< * >, p), and the load balancer has I (< dm =MACLB, di = IPLB, dp = 80 >, p) for its inet zone, whereMACLB and IPLB are the layer-2 and layer-3 addresses of theload balancer. Although I (< * >, p) is a tautology, we stillexplicitly specify it in the firewall model to enhance model clari-ty.

State DatabaseMost middleboxes maintain state associated with the flows andsessions they process. Our model represents state using key-valuepairs stored in zone-independent or zone-specific state databases.Processing rules (described next) record the state using the setprimitive and query state using the get? primitive.

Accurately tracking state removal is hard, unless explicitlyspecified by the del primitive in a processing rule. Althoughstate expiration timeouts can be specified as part of the setprimitive, inaccuracies in timeout values or in their fine-

grained measurement can cause discrepancies between themodel predicted behavior of a middlebox and its actual opera-tions. A middlebox behavior is predicted by the model. So themodel predicted behavior of a middlebox may be better thanits actual operations. As we illustrate in the next section, weuse special processing rules to flag such possible discrepancies.

Processing RulesProcessing rules model the core functionality of a middlebox.A processing rule specifies the action taken by a middleboxwhen a particular condition becomes true. For example, theprocessing of an incoming packet is represented by a rule ofthe general form:

Z(A, p) ∧ I (P, p) ∧ C (p) ⇒ Z (B, T (p)) ∧ state ops

The above rule indicates that a packet p reaching zone A ofthe middlebox is transformed to T(p) and emitted out throughzone B, if it satisfies the input precondition I(P, p) and a mid-dlebox-specific condition C(p). In addition, the middlebox mayupdate state associated with the TCP flow or application ses-sion to which the packet belongs. We now present concreteexamples of processing rules for common middleboxes.

Firewall — First, consider a simple stateless layer-4 firewallthat either drops a packet received on its red zone or relays itunmodified to the green zone. This behavior can be repre-sented using the following two rules:

Z(red, p) ∧ I(< * >, p) ∧ Caccept(p) ⇒ Z(green, p)Z(red, p) ∧ I(< * >, p) ∧ Cdrop(p) ⇒ DROP(p)

Since I(< * >, p) is a tautology, whether a packet isdropped or accepted by the firewall is solely determined bythe Caccept and Cdrop clauses that represent the filtering func-tionality of the firewall. Common filtering rules can be repre-sented easily using the appropriate Boolean expressions (e.g.,Caccept(p) : p.di = 80 || p.si = 128.34.45.6). For more com-plex filtering rules, we leverage external middlebox-specific

n Table 1. Notations used in this article.

∧ Logical AND operation ! Logical NOT operation

sm Source MAC (layer 2) address dm Destination MAC (layer 2) address

si Source IP (layer 3) address di Destination IP (layer 3) address

sp Source TCP/UDP (layer 4) port dp Destination TCP/UDP (layer 4) port

p Packet [hd] Packet with header h and payload d

5tpl Packet 5-tuple: si, di, sp, dp, proto

Xrev Swaps any source-destination IP, MAC, or port number pairs in X

Z(A, p) true if packet p arrived at or departed zone A

I (P, p) Input precondition; true if packet p matches pattern P

C(p) Condition specific to middlebox functionality

newflow?(p) true if packet p indicates a new flow, e.g., TCP SYN

set(A, key → val) Stores the specified key-value pair in zone A’s state database

S : get?(A, key) Returns true and assigns val to S if key → val is present in zone A’s state database

JOSEPH LAYOUT 9/5/08 1:04 PM Page 21

Page 21: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200822

models like the Unified Firewall Model [2] to construct theappropriate C clauses. Rules for packets in the green → reddirection are similar.

NAT — Next, consider another very common middlebox — aNAT. Unlike the firewall in the previous example, a NATrewrites packet headers and maintains per-flow state. We firstdescribe the processing rules (rule box 1) for a full cone NATand then, with minor modifications, change it to represent asymmetric NAT.

Rule (i) describes how a full cone NAT processes a packet[hd] with a previously unseen [si, sp] pair received at its intzone. It allocates a new port number using a standard mecha-nism like random or sequential selection, or using a custommechanism beyond the scope of our general model. It stores[si, sp] → newport and newport → [si, sp] in the statedatabases of zone int and zone ext, respectively. It rewritesthe packet header h by applying the source NAT (SNATfwd)transformation function — the source medium access control(MAC) and IP addresses are replaced with the publicly visibleaddresses of the NAT, the source port with the newly allocatedport number, and the destination MAC with the next hop IPgateway of the NAT. The packet with the rewritten header andunmodified payload is then emitted out through the ext zone.Rule (ii) specifies that the NAT emits a packet with a previouslyseen [si, sp] pair through zone ext, after applying SNATfwdwith the port number recorded in rule (i). Rule (iii) describeshow the NAT processes a packet reaching the ext zone. Itretrieves the newport → [si, sp] state recorded in rule (i)using the destination port number of the packet, applies thereverse source NAT transformation function(SNATrev), andthen emits the modified packet through zone int. Rule (iv) andRule (v) flag discrepancies resulting from the inaccuracy of themodel in tracking state expiration. The NAT may drop a packetarriving at its int or ext zone because the state associatedwith the packet expired without the knowledge of the model.

Unlike a full cone NAT, a symmetric NAT allocates a sepa-rate port for each [si, sp, di, dp] tuple seen at its int zone,rather than for each [si, sp] pair. Thus, for a symmetric NAT,the zone int state set in rule (i) and retrieved in rules (ii) and

(iv), is keyed by [h.si, h.sp, h.di, h.dp] rather than by just[h.si, h.sp]. A symmetric NAT is also more restrictive than afull cone NAT. It relays a packet with header [IPs, IPNAT,PORTs, PORTd] from the ext zone only if it had earlier received apacket destined to IPs : PORTs at the int zone and had rewrittenits source port to PORTd. This restrictive behavior is captured bykeying the zone ext state set in rule (i) and retrieved in rules (iii)and (v) with [h.di, h.dp, newport] rather than with just new-port. Other NAT types, like restricted cone and port restrictedcone, can be easily represented with similar minor modifications.

Layer-4 Load Balancer — Next, we present a layer-4 load bal-ancer, which unlike the NAT in the previous example, rewritesthe destination IP address of a packet to that of an availableWeb server (rule box 2).

Rule (i) describes how the load balancer processes the firstpacket of a new flow received at its inet zone. The load bal-ancer dynamically selects a Web server instance Wi for the flowand records it in the state database of the inet zone. Itrewrites the destination IP and MAC addresses of the packet toWi using the destination NAT (DNATfwd) transformation func-tion and then emits it out through the srvr zone. It alsorecords this flow in the state database of the srvr zone, keyedby the five-tuple of the packet expected there in the reverse flowdirection. Rule (ii) specifies that subsequent packets of the flowsimply will be emitted out after rewriting the destination IP andMAC addresses to those of the recorded Web server instance.Rule (iii) describes how the load balancer processes a packetreceived from a Web server. It verifies the existence of flow statefor the packet and then emits it out through the inet zoneafter applying the reverse DNAT transformation — that is,rewriting the source IP and MAC addresses to those of the loadbalancer and the destination MAC to the next hop IP gateway.

Although the Web server instance selection mechanism isbeyond the scope of our general model, the load balancer modeleasily can be augmented with primitives to represent commonselection mechanisms like least loaded and round robin. In theprevious example, we assumed that the load balancer was set asthe default IP gateway at each Web server. Other load balancerdeployment configurations (e.g., direct server return or sourceNAT) can be represented with minor modifications.

Layer-7 Load Balancer — We now present our most complexexample, a layer-7 SSL offload-capable load balancer. Thisexample illustrates how our model describes a middleboxwhose processing spans both packet headers and contents andis not restricted to one-to-one packet transformations. Thelayer-7 load balancer is the end point of the TCP connectionfrom a client (the CL connection). Because accurately model-ing TCP is very hard, we abstract it using a black box TCPstate machine tcpCL and buffer the data received from theclient in a byte queue DCL. The I clauses are similar to those inthe layer-4 load balancer and hence not repeated in rule box 3.

Rule (i) specifies that the load balancer creates tcpCL andDCL and records them along with the packet header on receiv-ing the first packet of a new flow from a client at the inetzone. Rule (ii) specifies how the TCP state and data queue ofthe CL connection are updated as the packets of an existingflow arrive from the client. Rule (iii), triggered when tcpCLhas data or acknowledgments to send, specifies that packetsfrom the load balancer to the client will have header hCL

rev

(with appropriate sequence numbers filled in by tcpCL) andpayload read from the DLS queue, if it was already created bythe firing of rule (iv). Rule (iv), triggered when the data col-lected in DCL is sufficient to parse the HTTP request URLand/or cookies, specifies that the load balancer selects a Webserver instance Wi and opens a TCP connection to it, that is,

n Figure 1. Zones of different middleboxes: a) firewall; b) NAT;and c) load balancer in single-legged configuration.

Web

ser

vers

Internet Switch

Redzone

(a)

(b)

(c)

Greenzone

Externalzone

Internetzone

Serverzone

Internalzone

Secureinternalnetwork

Insecureexternalnetwork

Internalnetwork

Externalnetwork NAT

JOSEPH LAYOUT 9/5/08 1:04 PM Page 22

Page 22: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

creates tcpLS and DLS. It also installs a pointer to the stateindexed by the DNATed header hLS in the state database ofthe srvr zone. Rule (v) shows how this state is retrieved,and its tcpLS and DLS are updated, on receipt of a packetfrom a Web server. Rule (vi) specifies the header and payloadof packets sent by the load balancer to a Web server instance— hLS and data read from DCL.

The rules listed above represent a plain layer-7 load bal-ancer. By replacing the + and read data queue operationswith +ssl and readssl operations that perform SSL encryptionand decryption on the data, we can represent an SSL offload-capable load balancer without disturbing other rules. Similarto the TCP black box, we abstract out the details of the SSLprotocol.

Auxiliary TrafficIn addition to its core functionality of transforming and for-warding packets, a middlebox can generate additional traffic,either independently or when triggered by a received packet.For example, a load balancer periodically checks the livenessof its target servers by making TCP connections to each serv-er. It also can send an Address Resolution Protocol (ARP)request for the layer-2 address of the Web server assigned toa received packet. Such packets generated by middleboxesand their responses, which support middlebox functionality,are referred to as auxiliary traffic in our model.

Auxiliary traffic is represented using processing rules, aswell. For example, the auxiliary traffic associated with theload balancer can be represented in rule box 4.

The PROBE function returns a set of packets to check theliveness of server Wi. In the simple case, these are just theTCP hand-shake packets with the appropriate sm, dm, si, di,sp, and dp.

Interest and State FieldsThe interest fields of a middlebox identify the packet fields ofinterest, that is, the fields it reads or modifies. The state fieldsidentify the subset of the interest fields used by the middleboxin storing and retrieving state. Although these fields can bededuced from the processing rules, they are explicitly present-ed in the model because they can highlight succinctly unex-pected aspects of middlebox processing.

Utility of a Middlebox ModelA middlebox model is useful only if it can easily represent manyreal-world middleboxes and has practical applications. In this sec-tion, we first describe how we constructed the models described inthe previous section and then discuss the applications of our modelin planning and troubleshooting existing middlebox deploymentsand in guiding the development of new network architectures.

Model InstancesThe models for the firewall, NAT, and layer-4 and layer-7load balancers illustrated in the previous section were con-structed by analyzing generic middlebox descriptions and tax-onomies (like RFC 3234 [1]), consulting middlebox-specificmanuals, and observing the working of the following real-world middleboxes:• Linux Netfilter/iptables software firewall• Netgear home NAT• BalanceNg layer-4 software load balancer• HAProxy layer-7 load balancer Vmware appliance

We prototyped a black box testing-based model-inferencetool to aid middlebox model construction. The tool infershints about the operations of a middlebox by carefully sendingdifferent kinds of packets on one zone and observing thepackets emerging from other zones, as illustrated in Fig. 2.The following are some of the inferences generated by it:• The firewall does not modify packets; all packets sent by the

tool emerge unmodified or are dropped.• The load balancers only process packets addressed to them

at layers 2, 3, and 4.• The layer-4 load balancer rewrites the destination IP and

MAC addresses of packets in the inet → srvr directionand the source addresses in the reverse direction. This infer-ence was made by pairing and analyzing packets with identicalpayloads seen at the two zones of the load balancer. By usinga relaxed payload similarity metric, the header rewriting rulesfor even the layer-7 load balancer were partially inferred.

• The layer-4 load balancer caches source MAC addresses ofpackets processed by it in the inet → srvr direction anduses them in packets in the reverse direction. This inferencewas made by correlating rewritten packet header fields withvalues seen in earlier packets.Our inference tool is quite basic and serves only as an aid

for model construction. It is not fully automated; for example,it requires the IP address and TCP port of the load balanceras input to avoid an exhaustive IP address search for packets

IEEE Network • September/October 2008 23

n Rule box 1.

Z(int, [hd]) ∧ I(<*>, [hd])(i) ∧ IS : get?(int, [h.si, h.sp])

SNATfwd([sm, dm, si, di, sp, dp], PORT)

Z(int, [hd])(ii) ∧ I(<*>, [hd]) ∧ S : get?(int, [h.si, h.sp])

Z(ext, [hd]) ∧ I(<di=IPNAT,(iii) dm=MACNAT>, [hd]) ∧ S : get?(ext, h.dp)

SNATrev([sm, dm, si, di, sp, dp], IP, PORT)

Z(int, [hd])(iv) ∧ I(<*>, [hd]) ∧ S : get?(int, [h.si, h.sp])

Z(ext, [hd]) ∧ I(<di=IPNAT,(v) dm=MACNAT>, [hd]) ∧ S : get?(ext, h.dp)

Z(ext, [SNATfwd(h,newport)d]) ∧ set(int, [h.si, h.sp] – newport) ∧ set(ext,newport – [h.si, h.sp])

=[MACNAT,MACgw,IPNAT,di,PORT,dp]

Z(ext, [SNATfwd(h,S)d])

Z(int, [SNATrev(h,S.si, S.sp)d])

=[MACNAT,MACIP,si,IP,sp,PORT]

DROP([hd]) ∧ WARN(inconsistent state)

DROP([hd]) ∧ WARN(inconsistent state)

n Rule box 2.

Z(inet, [hd]) ∧ I(<dm=MACLB, di = IPLB,(i) dp=80>,[hd]) ∧ newflow?([hd])

DNATfwd([sm, dm, si, di, sp, dp], W)

Z(inet, [hd]) ∧ I(<dm=MACLB, di=IPLB,(ii) dp=80 >,[hd]) ∧ !newflow?([hd]) ∧ S : get?(inet, h.5tpl)

Z(srvr, [hd])^ I(<sm=MACWi

, si=IPWi,

(iii) sp=80 >,[hd]) ∧ S : get?(srvr, h.5tpl)

DNATrev([sm,dm,si,di,sp,dp])

Z(srvr, [DNATfwd(h,Wi)d]) ∧ set(inet, h.5tpl – Wi) ∧ set(srvr, DNATfwd(h,Wi)rev.5tpl – true)

= [SM,MACW,si,IPW,sp,dp]

Z(srvr,[DNATfwd(h,S)d])

Z(inet,[DNATrev(h)d])

= [MACLB,MACgw,IPLB,di,sp,dp]

JOSEPH LAYOUT 9/5/08 1:04 PM Page 23

Page 23: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200824

accepted by it. The inferred packet header transformationrules and state fields may not be 100 percent accurate andthus only serve to guide further analysis. For middleboxes likeSSL offload boxes that completely transform packet payloads,the tool cannot infer the processing rules.

We believe that completely inferring middlebox modelsthrough black box testing alone is impossible. If the sourcecode for a middlebox implementation were available, wehypothesize that automatic white box software test-generationtools like directed automated random testing (DART) [5] canbe adapted to infer middlebox model parameters. Automati-cally parsing middlebox configuration manuals to extract mod-els is another open research direction.

We envision an online repository containing models ofcommon middleboxes. We set up a pilot version of such arepository at http://www.middlebox.org with the modelsdescribed in this article. We hope that middlebox manufactur-ers and network administrators who use middleboxes will con-tribute additional models to the repository.

We also prototyped a model validation tool that analyzestraffic traces collected from the different zones of a middle-box and verifies whether its operations are consistent with itsmodel downloaded from the repository. Apart from flaggingerrors and incompleteness in the models themselves, the vali-dation tool can be used to detect unexpected middleboxbehavior, as we describe next.

Network Planning and TroubleshootingThe middlebox model clearly describes how various middle-boxes under different configurations interact with the networkand with each other in a standard and concise format. Thisinformation aids in planning new middlebox deployments andin monitoring and troubleshooting existing ones.

The input preconditions of a middlebox specify the types ofpackets expected by it and thus help a network architect plan thenetwork topology and middlebox placement required to deliverthe correct packets to it. The input preconditions and processingrules together help in analyzing the feasibility of placing differentmiddleboxes in sequence. For example, because the right-handsides of the firewall processing rules do not interfere with theconditions on the left-hand sides of the load balancer processingrules, the firewall can be placed in front of the load balancer withlittle scrutiny. However, placing the load balancer before the fire-wall requires more careful analysis as the destination addressrewriting indicated by the processing rules of the load balancermay interfere with the Caccept and Cdrop clauses of the firewall.

The middlebox processing rules specify the packets flowing in

different parts of a network. This information can be used to stat-ically analyze and detect problems with a middlebox deploymentbefore actual network rollout. It also aids in troubleshooting exist-ing middlebox deployments and enhances automated traffic moni-toring and anomaly detection. For example, the model validationtool helped us detect unexpected NAT behavior in the home net-work of one of the authors. The author’s home NAT was notrewriting the source port numbers of the packets sent by internalhosts. The tool automatically flagged this behavior as a violationof rules (i) and (ii) of our NAT model. We expected the multi-interface home NAT to use source port translation to supportsimultaneous TCP connections to the same destination from thesame source port on multiple internal hosts. The failure of suchsimultaneous TCP connections on further investigation confirmedthe anomaly. Although a small example, this experience indicatesthat our middlebox model holds practical utility in detecting unex-pected middlebox behavior.

Guide Networking ResearchOur middlebox model provides networking researchers withclear and concise descriptions of how various middleboxesoperate. Such information is very useful for researchers, aswell as companies involved in developing new network archi-tectures, especially those that deal with middleboxes [6]. Notonly does it provide hints to make a new architecture compat-ible with existing middleboxes, but it also helps identify mid-dleboxes that cannot be supported.

In retrospect, the availability of a middlebox model wouldhave benefited our research greatly on designing the policy-aware switching layer (PLayer) [4], alluded to earlier. ThePLayer consists of enhanced layer-2 switches (pswitches) thatexplicitly forward packets to the middleboxes specified by a net-work administrator. In our original (erroneous) design, pswitch-es rewrote the source MAC addresses of packets processed bya transparent firewall to a unique dummy MAC address tomark packets that had already been processed by the firewall.Contrary to our expectation of the load balancer to use ARP, itcached the dummy source MAC addresses of packets in theforward flow direction and used them to address packets in thereverse direction. Such packets never reached their intendeddestinations. The presence of source MAC address in the inter-est and state fields of the load balancer would have helped usmore quickly debug this problem. Moreover, it would havewarned us against rewriting the source MAC address in ouroriginal design, thus avoiding a time-consuming redesign.

LimitationsThe model presented in this article is only a first step towardmodeling middleboxes. Its three main limitations are:• The inability to describe highly-specific middlebox opera-

tions in detail• The lack of formal coverage proofs• The complexity of model specification

The goal of building a general middlebox model that candescribe a wide variety of middleboxes precludes our modelfrom representing functionality that is very specific to a partic-ular middlebox. We can extend our model easily using middle-

n Rule box 3.

Z(inet, [hd])(i) ∧ I(...) ∧ newflow?([hd])

Z(inet, [hd]) ∧ I(...)(ii) ∧ !newflow?([hd]) ∧ S : get?(inet, h.5tpl)

(iii) S.tcpCL.ready?

(iv) S.DCL.url?

Z(srvr, [hd])

(v) ∧ I(...) ∧ S : get?(srvr, h.5tpl)

(vi) S.tcpLS.ready?

set(inet, h.5tpl –[tcpCL = TCP.new,DCL = Data.new, hCL = h])

S.tcpCL.rev(h) ∧ S.DCL+d

Z(inet,S.tcpCL.send(S.hrev

CL,S.DLS.read))

S.hLS = DNATfwd(S.hCL, Wi) ∧ set(srvr,S.hrev

LS.5tpl – S) ∧ S.DLS = Data.new ∧ S.tcpLS = TCP.new

S.tcpLS.recv(h) ∧ S.DLS+d

Z(srvr,S.tcpLS.send(S.hLS,S.DCL.read))

n Rule box 4.

PERIODIC

Z(inet,[hd])∧ S : get?(inet, h.5tpl)∧ !S’ : get?(-,IPS)

Z(srvr, ARPRPLY (IP, MAC))

Z(srvr, PROBE(IPWi))

Z(srvr, ARPREQ(IPS))

set(-,IP – MAC)

JOSEPH LAYOUT 9/5/08 1:04 PM Page 24

Page 24: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 25

box-specific models like the Unified Firewall Model asdescribed earlier, although at the expense of reducing modelsimplicity and conciseness. The desire for simplicity and con-ciseness also limits our model from capturing accurate timingand causality between triggering of different processing rules.

On the other hand, our model may not be general enoughto describe all possible current and future middleboxes.Although we represented many common middleboxes in ourmodel and are not aware of any existing middleboxes thatcannot be represented, we are unable to formally prove thatour model covers all possible middleboxes.

The model for a particular middlebox consists of a smallnumber (typically < 10) of processing rules. However, con-structing the model itself is a non-trivial task even with supportfrom our model inference and validation tools. We expectmodels to be constructed by experts and shared through anonline model repository, thus making them easily available toall, without requiring widespread model construction skills.

Related WorkThe middlebox model described in this article is placed at an inter-mediate level in between related work on very general networkcommunications models and very specific middlebox models.

An axiomatic basis for communication [7] presents a generalnetwork communications model that axiomatically formulatespacket forwarding, naming, and addressing. This article presentsa model tailored to represent middlebox functionality and oper-ations. The processing rules and state database in our model aresimilar to the forwarding primitives and local switching table in[7]. As part of future work, we plan to investigate the integra-tion of the two models and thus combine the practical benefitsof our middlebox model (e.g., middlebox model inference andvalidation tools, model repository) and the theoretical benefitsof the general communications model (e.g., formal validation ofpacket forwarding correctness through chains of middleboxes).

Predicate routing [8] attempts to unify security and routingby declaratively specifying network state as a set of Booleanexpressions dictating the packets that can appear on variouslinks connecting together end nodes and routers. Thisapproach can be extended to represent a subset of our mid-dlebox model. For example, Boolean expressions on the portsand links (as defined by predicate routing) of a middlebox canspecify the input preconditions of our model and indirectlyhint at the processing rules and transformation functions.From a different perspective, middlebox models from ourrepository can aid the definition of the Boolean expressions ina network implementing predicate routing.

Reference [9] uses statistical rule mining to automaticallygroup together commonly occurring flows and learn the under-lying communication rules in a network. Our work has a nar-

rower and more detailed focus on how middle-boxes operate. Reference [10] uses detailed mea-surement techniques to evaluate the performanceand reliability of production middlebox deploy-ments. We plan to investigate how the techniquesdescribed in these papers can enhance our modelinference and validation tools.

RFC 3234 [1] presents a taxonomy of middle-boxes. Our model goes well beyond a taxonomyand describes middlebox packet processing inmore detail using a concise and standard lan-guage. In addition, our model can naturally inducea more fine-grained taxonomy on middleboxes(e.g., “middleboxes that rewrite the destination IPand port number” versus “middleboxes operatingat the transport layer”). Our model does not cur-

rently consider the middlebox failover modes and functional ver-sus optimizing roles identified by RFC 3234.

The Unified Firewall Model [2] and IETF BEHAVE [3]working group characterize the functionality and behavior ofspecific middleboxes — firewalls and NATs in this case. Guid-ed by these efforts, we construct a general model that appliesto a wide range of middleboxes and enables us to comparedifferent middleboxes and study their interactions. Further-more, these specific models can be plugged into our generalmodel and alleviate the limitations of model generality.

ConclusionIn this article, we presented a simple middlebox model andillustrated how various commonly used middleboxes can bedescribed by it. The model guides middlebox-related researchand aids middlebox deployments. Our work is only an initialstep in this direction and calls for the support of the middle-box research and user communities to further refine themodel and to contribute model instances for the many differ-ent kinds of middleboxes that exist today.

References[1] “Middleboxes: Taxonomy and Issues,” RFC 3234.[2] G. J. Nalepa, “A Unified Firewall Model for Web Security,” Advances in

Intelligent Web Mastering.[3] “Behavior Engineering for Hindrance Avoidance”; http://www.ietf.org/

html.charters/behave-charter.html[4] D. Joseph, A. Tavakoli, and I. Stoica, “A Policy-Aware Switching Layer for

Data Centers,” Proc. SIGCOMM, 2008.[5] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed Automated Random

Testing,” Proc. PLDI, 2005.[6] M. Walfish et al., “Middleboxes No Longer Considered Harmful,” Proc.

OSDI, 2004.[7] M. Karsten et al., “An Axiomatic Basis for Communication,” Proc. SIG-

COMM ’07.[8] T. Roscoe et al., “Predicate Routing: Enabling Controlled Networking,” SIG-

COMM Comp. Commun. Rev., vol. 33, no. 1, 2003.[9] S. Kandula, R. Chandra, and D. Katabi, “What’s Going On? Learning Com-

munication Rules in Edge Networks,” Proc. SIGCOMM, 2008.[10] M. Allman, “On the Performance of Middleboxes,” Proc. IMC, 2003.

BiographiesDILIP JOSEPH ([email protected]) received his B.Tech. degree in computer sciencefrom the Indian Institute of Technology, Madras, in 2004 and his M.S. degree incomputer science from the University of California at Berkeley in 2006. He is current-ly a Ph.D. candidate at the University of California at Berkeley. His research interestsinclude data center networking, middleboxes, and new Internet architectures.

ION STOICA ([email protected]) received his Ph.D. from Carnegie MellonUniversity in 2000. He is an associate professor in the EECS Department at theUniversity of California at Berkeley, where he does research on peer-to-peer net-work technologies in the Internet, resource management, and network architec-tures. He is the recipient of the 2007 Rising Star Award, a Sloan FoundationFellowship (2003), a Presidential Early Career Award for Scientists and Engi-neers (PECASE) (2002), and the ACM doctoral dissertation award (2001). In2006 he co-founded Conviva, a startup company to commercialize peer-to-peertechnology for video distribution.

n Figure 2. Middlebox model inference tool analyzing a load balancer.

Model inference tool

Internetzone

Serverzone

ObserveControlpacket

sending

JOSEPH LAYOUT 9/5/08 1:04 PM Page 25

Page 25: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200826 0890-8044/08/$25.00 © 2008 IEEE

etwork address translation (NAT) is a commonmethod for separating private networks fromglobal networks by translating private InternetProtocol (IP) addresses to global IP addresses.

Often there is only one global IP address available for multi-ple hosts inside the private network. In this case, the transportlayer port number also is modified, and the method is callednetwork address and port number translation (NAPT). NATand NAPT have been in use for the Transmission ControlProtocol (TCP) and the User Datagram Protocol (UDP) for along time, but the Stream Control Transmission Protocol(SCTP), as a fairly new transport protocol is not supportedyet. Applying this method also to SCTP does not work formultihomed associations.

Currently, NAT implementations that support SCTP in away similar to TCP or UDP are being developed first.Although this works well for single-homed SCTP associations,it does not work for multihomed SCTP associations. Thismakes these solutions non-applicable for typical SCTP appli-cations that require multihoming. However, in these cases,some vendors and operators also want to use NAT middle-boxes for various reasons. Therefore, it is important to haveNAT middleboxes that not only support SCTP in a limitedway, but with all features, especially multihoming.

In [1] and [2], the authors of this article describe anapproach to integrate SCTP in network address translators forsingle-homed client-server communication. This articleextends this method in a way that also works in the case ofmultihomed and peer-to-peer scenarios. Additionally, it coversthe case of transport layer mobility or routing changes in thenetwork. These additions also will be provided to the InternetEngineering Task Force (IETF) for standardization.

The structure of this article is as follows: first, we providean introduction to SCTP, emphasizing the features that arerelevant for this article. We discuss generic NAT and NAPTmethods for traffic based on TCP or UDP and for traffic that

is not based on these protocols. Their applicability for SCTPis analyzed. An SCTP-specific method for NAT middleboxesthat overcomes the deficiencies of the generic methods isdescribed. Several examples are given explaining in detail howthe SCTP-specific NAT method works for different scenariosincluding single-homed and multihomed client-server scenar-ios, peer-to-peer scenarios, and transport-layer mobility sce-narios. Then, conclusions are presented.

Introduction to the Stream ControlTransmission ProtocolSCTP is currently specified in [5]. It was standardized by theIETF as the generic transport protocol for signaling transportin IP-based telephone signaling networks.

SCTP is a connection-oriented protocol providing reliabletransport of user messages. It supports IPv4 and IPv6 as a net-work layer. A connection between two SCTP end points iscalled an SCTP association or just an association.

One of the major design goals was network fault tolerance,and therefore, each SCTP end point can use multiple IP-addresses within each association but only one port number.Each IP address of the peer can be used as the destinationaddress of a packet. Currently, this multihoming support isused only for redundancy, but ongoing research is analyzingthe possibility of also using it for load sharing.

SCTP is already part of all recent Linux distributions — theSolaris 10 operating system and the FreeBSD 7.0 release. It isdeployed to signal networks of telephony network operatorsand is used in IP-based signaling for universal mobile telecom-munication system (UMTS) networks. Other applicationsusing SCTP include the IP Flow Information Export (IPFIX)protocol, Diameter, and the Reliable Server Pooling (RSer-Pool) protocol suite. It should be noted that SCTP was thefirst transport protocol specified by the IETF in 2000 and

NN

Michael Tüxen and Irene Rüngeler, Münster University of Applied SciencesRandall Stewart, The Resource Group

Erwin P. Rathgeb, University of Duisburg-Essen

AbstractNetwork address translation is widely deployed in the Internet and supports theTransmission Control Protocol and the User Datagram Protocol as transport layerprotocols. Although part of the kernels of all recent Linux distributions, namely, theFreeBSD 7 and the Solaris 10 operating systems, the new Internet EngineeringTask Force transport protocol — Stream Control Transmission Protocol — is notsupported on most NAT middleboxes yet. This article discusses the deficiencies ofusing existing NAT methods for SCTP and describes a new SCTP-specific NAT con-cept. This concept is analyzed in detail for several important network scenarios,including peer-to-peer, transport layer mobility, and multihoming.

Network Address Translation for the StreamControl Transmission Protocol

TÜXEN LAYOUT 9/5/08 1:00 PM Page 26

Page 26: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 27

deployed in commercial networks after the introduction ofUDP and TCP in the 1980s. Four years later, a modificationof UDP with limited checksum coverage — UDP-Lite — wasstandardized and is used in Third Generation PartnershipProject (3GPP) networks. In 2006, the IETF standardized theDatagram Congestion Control Protocol (DCCP). Currently,neither UDP-Lite nor DCCP are available on major operatingsystems.

An SCTP packet consists of a common header followed bya number of chunks. The common header contains source anddestination port numbers similar to TCP or UDP headers, a32-bit verification tag and a CRC32C checksum. The check-sum covers only the SCTP packet and does not take any kindof pseudo header into account. Each chunk consists of a typefield, eight flags, a length field, and type-specific data. Fur-thermore, it is padded at the end to be 32-bit aligned.

The basic association setup procedure is based on a four-way handshake and follows the client-server principle. It isshown on the left-hand side of Fig. 1. The first SCTP mes-sage is sent from the client to the server. It contains exactlyone chunk, the initiation (INIT) chunk. The INIT chunk con-tains a 32-bit random number, the initiate tag, and the list ofIP-addresses used by the client. The server responds with anSCTP message, which also contains just one chunk, the initia-tion acknowledge (INIT-ACK) chunk. It also contains a 32-bit random initiate tag and the list of addresses of the server.If the client or server is single-homed, the list of addresses inthe INIT or INIT-ACK chunk should be empty. After theserver has sent the INIT-ACK chunk, it does not hold anystate regarding the association. Instead, it puts all informa-tion in a state cookie, which itself is put into the INIT-ACKchunk. On reception of the INIT-ACK chunk, the clientsends the state cookie in a COOKIE-ECHO chunk to theserver. On reception of the COOKIE-ECHO chunk, the serv-er responds with a COOKIE-ACK chunk, and the associationis established. Other chunks might be bundled with theCOOKIE-ECHO or COOKIE-ACK chunk in the third andfourth message.

The verification tag in the common header is always theinitiate tag sent by the peer in the INIT or INIT-ACK mes-sage during the association setup. This is used to protect asso-ciations against blind attacks. Only the common header of thepacket containing the INIT chunk has the verification tag 0. Itis important to note that most SCTP implementations use theverification tag for looking up the association when a packet isreceived. Some implementations even ensure that the verifica-tion tags are unique across all associations currently known.

SCTP supports not only the client-server model for associa-tion setup, but also the more general peer-to-peer model.

Both end points can start the four-way hand-shake at about the same time, and the SCTPsetup procedure ensures that exactly oneassociation is established. This is called acollision case. An example message flow isshown on the right-hand side in Fig. 1.

It is also possible that one side starts theassociation procedure while the peer is stillin the established state. This might happen,for example, if one side reboots without tear-ing down the association and then starts theassociation setup procedure. The four-wayhandshake succeeds, and for the server side,the association restarts. One example isshown in the middle of Fig. 1; detaileddescriptions of the handling of all the possi-ble cases is given in [9].

If an SCTP end point must terminate anassociation immediately, it can send a packet containing anABORT chunk. This chunk also is sent in response toalmost all packets for which no association can be lookedup. On reception of an ABORT chunk, the association isterminated. Error conditions can be signaled by sending anERROR chunk. ABORT and ERROR chunks can includethe causes of the error in order to provide more detailedinformation. In addition to the base protocol, several exten-sions also were standardized and implemented. The SCTPextension that is crucial for this article is the ability to addor delete IP addresses dynamically during the lifetime of anSCTP association. This is specified in [6]. If an SCTP endpoint wants to add or delete an IP-address, it sends anaddress configuration change (ASCONF) chunk that con-tains the address to be added or deleted and an address thatcan be used to look up the association, the so-called lookupaddress. When the peer has processed an ASCONF chunk,it sends back an address configuration acknowledgment(ASCONF-ACK) chunk. There is a special rule that if theaddress to be added is the wildcard address (0.0.0.0 for IPv4or ::0 for IPv6), the source address of the packet containingthe ASCONF chunk is added. If the address to be deleted isthe wildcard address, all addresses except for the sourceaddress of the packet containing the ASCONF chunk aredeleted.

Applicability of Generic Methods for NAT orNAT TraversalUDP or TCP-like Network Address and Port NumberTranslationNAT in its original meaning is realized by changing the (pri-vate) IP address of the client to a global address of the NATmiddlebox and keeping this correlation in a table (Fig. 2).Thus, the server addresses its packets to this global address,reaches the NAT, which substitutes the destination addresswith the address of the client. This is a feasible method, aslong as the source ports of the clients connecting to the sameserver are different. The source port numbers are chosendynamically from operating system dependent ranges. Someoperating systems use the port numbers between 49152 and65535. Because many clients can be located behind the sameNAT middlebox, and these clients might access a very popularserver at about the same time, the chance that two clients getthe same port is non-negligible.

Therefore, TCP or UDP sessions usually are translated bychanging the private IP address and additionally, the private

n Figure 1. Examples of the SCTP association setup.

INIT-ACK

INIT

Host A Host B

COOKIE-ACK

COOKIE-ECHO

INIT-ACK

INIT

Host A Host B

INIT

COOKIE-ACK

COOKIE-ECHO

INIT-ACK

INIT

COOKIE-ECHO

INIT

Host A Host B

INIT-ACK

COOKIE-ACK

TÜXEN LAYOUT 9/5/08 1:00 PM Page 27

Page 27: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

port number to a global IP address and port number in theTCP or UDP header, respectively. This method is calledNAPT. Thereby, the NAT middlebox chooses the port num-bers from a pool and makes sure that no two connections tothe same server obtain the same port numbers.

As the transport layer checksum of the TCP and UDPpackets covers the transport header that includes the portnumbers, it must be modified according to the port numberchange. However, the checksum used for TCP or UDP hasthe property that the change of the checksum can be comput-ed only from the change of the port numbers. So this can bedone very efficiently by a simple set of additions and subtrac-tions.

It should be noted that the behavior of NAT middleboxesvaries dramatically because there were no standards describ-ing how to build them. The Behavior Engineering for Hin-drance Avoidance (BEHAVE) working group of the IETFdevelops best current practice (BCP) documents givingrequirements for NAT middlebox behavior and protocols tohelp applications to run over networks with NAT middlebox-es.

Considering only single-homed SCTP clients and servers, itis also possible to use this NAPT concept for SCTP because ithas the same port number concept as TCP and UDP. Howev-er, the transport layer checksum used by SCTP is differentfrom the one used by UDP and TCP. This checksum does notallow the computing of the checksum change based only onthe port number change. Therefore, the NAT middlebox mustcompute the new SCTP checksum again, based on the com-plete SCTP packet. This requires a substantial amount ofcomputing power that might be reduced when the computa-tion is performed directly by hardware.

For multihomed SCTP clients and servers, reusing thetechniques from TCP and UDP becomes much harder. Aswe mentioned earlier, hosts can be multihomed, whichmeans that they can simultaneously use multiple networkaddresses and thus can be attached to multiple networks.Therefore, the traffic of one SCTP association, in general,passes through different NAT middleboxes on differentpaths. Because each SCTP end point can use only oneSCTP port number on all paths, the NAT middleboxescannot change the port number independently. To applythe existing NAT concept, the NAT middleboxes involvedwould have to synchronize the port numbers to assign acommon number for the association. This is very hard toachieve.

Based on this discussion, it seems desirable to use a NATmechanism for SCTP that does not require a change to theSCTP header at all and hence to the port numbers, whichavoids synchronization among NAT middleboxes and therecomputation of the SCTP checksum.

UDP-Based TunnelingCurrently, most NAT middleboxes support onlyprotocols running on top of TCP or UDP. A stan-dard technique for all other protocols is to encap-sulate these packets into UDP instead of IP.Because both UDP and IP provide an unreliablepacket delivery service, this is feasible. This alsoworks for SCTP, as described in [3], and is cur-rently implemented in the SCTP kernel extensionfor Mac OS X.

It should be noted that NAT middleboxes ondifferent paths are not synchronized, and there-fore, the UDP port number might be different ondifferent paths.

One drawback of using UDP encapsulation isthat Internet Control Message Protocol (ICMP)

messages might not contain enough information to be pro-cessed by the SCTP layer. Another drawback is that the sim-ple peer-to-peer solution described in the sections aboutpeer-to-peer communication and multihoming with a ren-dezvous server does not work because the UDP port numbersmight be changed by NAT-middleboxes.

Tunneling SCTP over UDP must handle the same prob-lems as any other UDP-based communication for NAT traver-sal. However, this is the only possibility for SCTP-basedcommunication through a NAT middlebox without modifyingit to add SCTP support.

An SCTP-Specific Variant of NATIn the NAPT method described previously, the NAT middle-box controls the 16-bit source port number of outgoing TCPconnections to distinguish multiple TCP connections of allclients behind the NAT middlebox to the same server. Thebasic idea for the SCTP-specific method is instead to use thecombination of the source port number and the verificationtag. For single-homed hosts, this method is described in [2].

If NAT middleboxes use the verification tags together withthe addresses and the port numbers to identify an association,the probability that two hosts end up with the same combina-tion decreases to a tolerable level.

A Simple Association SetupThe main task of a NAT middlebox is to substitute the sourceaddress of each packet with the public address used by theNAT middlebox and to keep the corresponding IP addressesin a table. First, we consider an association setup between asingle-homed client and a single-homed server. Neither theINIT nor the INIT-ACK chunk contain any IP addresses. Thisleads to a scheme as described in Fig. 3.

In the first message of the handshake, the verification tagin the common header must be set to 0, but the initiate tag(initTag) in the INIT chunk holds a 32-bit random numberthat is supposed to be the verification tag (VTag) of theincoming packets. Hence, at the beginning of the handshake,only one verification tag is known. The NAT middlebox keepstrack of this information and takes the local private address(Local-Address) and the officially registered destination IPaddress (Global-Address) from the IP header of the SCTPpacket and saves them in the NAT table (Fig. 3). The localsource port (Local-Port) and the destination port (Global-Port) are obtained the same way.

The initiate tag of the INIT chunk, which the client haschosen for its communication, is also extracted from the INITchunk header and saved as Local-VTag. The Global-VTagthat eventually will be chosen by the communication partneris not known yet. Before forwarding the packet, the NAT mid-

IEEE Network • September/October 200828

n Figure 2. Using basic NAT.

Internet

100.4.5.1:8080

120.10.2.1

120.10.2.1:52001 => 100.4.5.1:8080120.10.2.1:52002 => 100.4.5.1:8080120.10.2.1:52003 => 100.4.5.1:8080

10.1.0.1:52001

10.1.0.2:52002

10.1.0.3:52003

TÜXEN LAYOUT 9/5/08 1:00 PM Page 28

Page 28: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 29

dlebox exchanges the source address of the IP header with theNAT address (Nat-Global-Address) and sends the packettoward the other end point.

The other SCTP end point receiving the packet containingthe INIT chunk answers the request with a message contain-ing the INIT-ACK chunk. This message is addressed to theNAT-Global-Address and the Local-Port. Its verification tagin the common header must be identical to the initiate tag ofthe INIT chunk, whereas the initiate tag of the INIT-ACKchunk will be used as the verification tag for all packets thatare sent by the initiating end point (client 10.1.0.1 in the fig-ure) of the association. For an incoming INIT-ACK chunk,the NAT middlebox searches the table entries for the corre-sponding combination of Local-Port, Global-Address, Global-Port, and the Local-VTag and adds the Global-VTag. Thus,after the reception of the INIT-ACK chunk, both verificationtags are known. Now the NAT middlebox sets the destinationaddress to the Local-Address found in the table entry anddelivers the packet. To complete the handshake, a packet witha COOKIE-ECHO chunk is sent that is acknowledged with amessage containing a COOKIE-ACK chunk.

NAT TableThe NAT table consists of several entries. Each entry is atuple consisting of:

1) Local-Address2) Global-Address3) Local-Port4) Global-Port5) Local-VTag6) Global-VTag

In addition to the procedure to modify the table given in thenext subsection, a timer must be used to remove entries thathave not been used for a certain amount of time. This timeshould be long enough such that the SCTP path supervisionprocedure prevents the table entries from timing out.

Modifications to the NAT TableThe basic procedure for handling INIT and INIT-ACK chunkswas described previously. If the INIT or INIT-ACK chunkcontains a list of addresses, then for each address in the list,an entry is added to the table.

If an ASCONF chunk is received to add the wildcard

address, an entry to the NAT table is made for that address.Because both verification tags must be added, a parametermust be included in the ASCONF chunk that contains theverification tag that is not present in the common header.

Behavior of the SCTP End PointsBecause multiple clients behind the NAT middlebox mightchoose the same local port when connecting to the same serv-er, the restart procedure would result in a loss of an SCTPassociation. Therefore, the INIT chunk sent by the clientsshould contain a parameter indicating that the server shouldnot follow the restart procedure. Instead it should use the ver-ification tag to distinguish between the associations. This iswhat most SCTP implementations already do.

Furthermore, the SCTP end points must not include non-global addresses in the INIT or INIT-ACK chunk.

If an SCTP end point is multihomed and has non-globaladdresses, it should set up the association single-homed andthen add the other addresses after the association has beenestablished by sending an SCTP packet containing anASCONF chunk for each address. To add such an address,the ASCONF should contain only the wildcard address andthe parameter providing the required verification tag. Thesource address of the packet containing the ASCONF chunkwill be added to the association.

To remove an address, an ASCONF chunk is sent with thewildcard address. Then, all addresses except the sourceaddress of the packet containing the ASCONF chunk aredeleted from the association.

Communication between the NAT Middleboxes andthe SCTP End PointsIf a NAT middlebox receives an INIT chunk that would resultin adding an entry to the NAT table that conflicts with analready existing entry, it should not insert this entry and maysend an ABORT chunk back to the SCTP end point. In theABORT chunk, an M-bit should be set that indicates that ithas been generated by a middlebox. This happens if two dif-ferent clients choose the same local port number and initiatetag and try to connect to the same server. On reception ofsuch an ABORT chunk, the end point can try to choose a dif-ferent initiate tag and try setting up the association again.

n Figure 3. Four-way handshake for the SCTP association setup with NAT table.

Chunk type

INITINIT-ACK

Local-Port

5200152001

Local-Address

10.1.0.110.1.0.1

Global-Address

100.4.5.1100.4.5.1

Global-Port

80808080

Local-VTag

1234512345

Global-VTag

-45678

INIT: 10.1.0.1:52001=>100.4.5.1:8080Vtag=0, initTag=12345

Client10.1.0.1:52001

NAT120.10.2.1

Server100.4.5.1:8080

INIT-ACK: 100.4.5.1:8080=>10.1.0.1:52001Vtag=12345, initTag=45678

INIT: 120.10.2.1:52001=>100.4.5.1:8080Vtag=0, initTag=12345

INIT-ACK: 100.4.5.1:8080=>120.10.2.1:52001Vtag=12345, initTag=45678

COOKIE-ECHO: 10.1.0.1:52001=>100.4.5.1:8080Vtag=45678

COOKIE-ACK: 100.4.5.1:8080=>10.1.0.1:52001Vtag=12345

COOKIE-ECHO: 120.10.2.1:52001=>100.4.5.1:8080Vtag=45678

COOKIE-ACK: 100.4.5.1:8080=>120.10.2.1:52001

Vtag=12345

TÜXEN LAYOUT 9/5/08 1:00 PM Page 29

Page 29: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200830

If the NAT middlebox receives an SCTP packet that cannotbe processed because there is no entry in the NAT table, theNAT middlebox should discard the packet and can send backan ERROR chunk. An M-bit must be set to indicate that thechunk is generated by a middlebox, and an error cause shouldindicate that the NAT middlebox does not have the requiredinformation to process the packet. On reception of such anERROR chunk, the end point should use an ASCONF chunkto provide the required information to the NAT middlebox.

New SCTP Protocol ElementsClients require a new parameter to be included in the INITchunk to indicate that they will use the procedures describedin this article. This parameter also is included in the INIT-ACK chunk to indicate that the receiver also supports it.Another new parameter is required that can contain a verifi-cation tag and is included in an ASCONF chunk.

Both the ERROR chunk and the ABORT chunk musthave an M-bit indicating that the packet containing the chunkis generated by a middlebox instead of the peer.

Two additional error causes are introduced, one to beincluded in the ERROR chunk to indicate that the NAT mid-dlebox misses some state, and one to be included in theABORT chunk to indicate a conflict in the NAT table.

ExamplesThis section provides a detailed discussion of several networkscenarios involving NAT middleboxes. The proposed NATmechanisms were verified in all these scenarios using anSCTP simulation in the INET framework for the OMNeT++simulation kernel described in [10].

Furthermore, a group of the Center for Advanced InternetArchitecture at Swinburne University is implementing thismethod for the FreeBSD operating system. This project,

n Figure 4. Building the NAT table for the single-homed client with a multihomed server.

Chunk type

INIT+INIT-ACKINIT-ACK

Local-Address

10.1.0.110.1.0.1

Global-Address

100.4.5.1100.5.5.1

Local-Port

5200152001

Global-Port

80808080

Local-VTag

1234512345

Global-VTag

4567845678

Server100.4.5.1:8080100.5.5.1:8080

Router 1

Router 2

Internet

NAT

Client10.1.0.1:52001

n Figure 5. After a route change a new NAT middlebox appears.

10.1.0.1=>100.4.5.1

new NAT

120.10.2.1:52001=>100.4.5.1:8080140.1.1.1:52001=>100.4.5.1:8080

Packets arriving at the server

Server100.4.5.1:8080

Internet

RouterNAT

Client10.1.0.1:52001

120.10.2.1

140.1.1.1

DATA: 120.10.2.1:52001=>100.4.5.1:8080

100.4.5.1=>10.1.0.1ERROR: 100.4.5.1:8080=>120.10.2.1:52001

Cause: NAT state missing

10.1.0.1=>100.4.5.1ASCONF: 120.10.2.1:52001=>100.4.5.1:8080

Vtag: 12345

100.4.5.1=>10.1.0.1

140.1.1.1=>100.4.5.1

100.4.5.1=>140.1.1.1ASCONF-ACK:

100.4.5.1:8080=>120.10.2.1:52001

TÜXEN LAYOUT 9/5/08 1:00 PM Page 30

Page 30: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 31

SCTP over NAT Adaptation (SONATA), is being implement-ed in cooperation with two of the authors and is based on [2].

Single-Homed Client to Multihomed ServerIn the case of a single-homed client and a multihomed server,the server announces all its global addresses in addressparameters included in the INIT-ACK chunk (Fig. 4). Thepacket crosses the NAT middlebox, which updates its entriesfor the association. When the client receives the chunk, itadds those addresses to its list of destination addresses. As aresult, there will be a separate entry for each server addressalthough there is only one association.

Adding New NAT MiddleboxesAfter setting up an association, data can be exchangedbetween client and server. The packets are routed through theInternet. It must be emphasized that the routes are not stableand can change during the lifetime of an association, in partic-ular if the association has a long life span as expected formajor SCTP application scenarios. Therefore, a new NATmiddlebox could become involved that has no knowledge ofthe properties of the association as shown in Fig. 5.

Passing through a new NAT middlebox also means that theserver receives a packet with a new source address, whichappears as if the client has an additional IP address.

In Fig. 5 the upper route shows the path where the associa-tion was set up initially. After the route was changed, thepackets travel on the lower route. An example for theaddress/port combination for both routes is shown below theserver.

If the new NAT middlebox receives the first packet fromthe client, it sends back a packet containing an ERRORchunk indicating that it lacks the required NAT table entry.Therefore, upon receipt of the ERROR chunk, the clientsends an ASCONF chunk on the new path with the requiredinformation. The new NAT middlebox can add a completeentry to its table upon receipt of this message.

This message can pass through the NAT middlebox and canbe acknowledged by the server with an ASCONF-ACK mes-sage. Afterward the communication can proceed as usual.

Client Using Transport Layer MobilitySCTP with its functionality of dynamic address configurationis well suited to be employed in an environment with hostmobility. Whereas all other parameters remain the same, themoving client will receive a new address. This not only resultsin a new source address for the packet but also in a changingroute, such that eventually another NAT middlebox must betraversed, which again, initially has no knowledge of the asso-ciation. As the situation is similar to the one described in thelast subsection, we suggest that the same actions are taken.

For more information on transport layer mobili-ty, see [7].

Peer-to-Peer CommunicationA greater challenge is the communicationbetween two peers, that is, two hosts that bothuse private IP addresses (peer-to-peer communi-cation). A detailed description for UDP andTCP is given in [8]. The two peers require anagent to help them find their communicationpartner. This agent usually is called a rendezvousserver.

In Fig. 6 the corresponding network setup isshown. The communication process in this caseconsists of two phases. First, associations are ini-tialized between the peers and the rendezvous

server; after retrieving the required information from the ren-dezvous server, the peers can communicate with each otherindependently of the server. After both peers retrieve therequired information, the actual communication between thepeers can start. As there is no server, both hosts must be ableto act as client and server. Thus, both start an association. Ifthe message containing the INIT chunk of Peer 1 reaches theNAT middlebox, NAT 2, before the message of Peer 2 couldarrive, it will be discarded. The retransmission of the INITchunk will arrive if in the meantime, Peer 2 has punched ahole by triggering the NAT middlebox to set up a table entry.The best results can be achieved if the associations are startedat the same time. From the perspective of SCTP, the simulta-neous sending of INIT chunks also is not a normal situationbecause the INIT chunk is not followed directly by an INIT-ACK chunk but by another INIT chunk. The SCTP collisionhandling procedure ensures that exactly one associationbetween the peers is established.

Multihomed Client and ServerThe client sends an INIT chunk without a list of addresses tothe server, which responds with an INIT-ACK chunk includ-ing a list of all addresses of the server. As shown in Fig. 7, thisinitial handshake uses the path via NAT 1.

After the association is established, the client adds its sec-ond address by sending an ASCONF chunk. If the packetcontaining this chunk is sent via the path containing NAT 2,both NAT middleboxes have the required state. If this packetis sent on the path via NAT 1, any packet sent from the clienton the path via NAT 2 results in an ERROR chunk beingsent back, and this triggers the sending of an ASCONF chunk.

n Figure 6. Peer-to-peer communication with rendezvous server.

Rendezvous server

Router

NAT 2

NAT 1 100.1.2.254

100.1.1.25410.1.1.1

10.1.2.1

Peer 1

Peer 2

100.1.3.254

100.1.2.253100.1.1.253

10.1.3.1

100.1.3.1

10.1.4.1

Peer 3

Peer 4

n Figure 7. Multihoming through NAT middleboxes.

ServerClient

NAT 1

NAT 2

1 INIT

32INIT-ACK

5 ASCONF, ADD-IP

6ASCONF-ACK

4COOKIE-ACKCOOKIE-ECHO

TÜXEN LAYOUT 9/5/08 1:00 PM Page 31

Page 31: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200832

This chunk provides the required information to the NATmiddlebox, NAT 2.

Multihomed Transport Layer MobilityPreviously, we discussed the procedure for a case when aclient moves and hence changes its source address and thecorresponding NAT middlebox as well. During the transitionfrom one cell to another in a host mobility scenario, there islikely to be a zone where both cells are active, and thus, twoaddresses can be in use. Adding the new address results in atemporarily multihomed client. We propose to handle this sit-uation in a way similar to the case explained in the last sec-tion. The new address is added by the sending of a messagecontaining an ASCONF chunk. But as the old address is com-pletely replaced by the new one as soon as the previous cell isleft, another parameter must be added that indicates that theprimary path should be set to the new address. This causesthe server to send the next packets to the new address.

Multihoming with Rendezvous ServerThe final step in increasing the complexity of the NAT sce-nario is the communication between two multihomed peersthat are behind different NAT middleboxes.

Just like in the single-homed case, the rendezvous servermust gather the peer information to fill its table. This time thetable must be enlarged by the additional addresses. The peersfirst set up an association with the rendezvous server. Usingthis server the peers can obtain each other’s addresses andport numbers.

At this point, the peers must set up an association via ini-tialization collision to provide a path by using hole punching.To also use the second path, on the way, the NAT middlebox-es must obtain the required information. By sending messagescontaining ASCONF chunks almost simultaneously, the NATmiddleboxes are notified to allow packets arriving from theopposite direction to pass through. Unfortunately, the mecha-nism described earlier to request information by sending amessage containing an ERROR chunk does not work whencoming from the global side of the network because only thehost behind the NAT middlebox can provide the data to fillthe NAT table. So when the message containing an ASCONFchunk arrives at the opposite NAT middlebox before a hole ispunched, the packet is discarded, but its retransmission mightbe successful. After both NAT tables receive the appropriateentries, the secondary paths also can be used.

ConclusionIn this article, we proposed a comprehensive solution for thesupport of SCTP in NAT middleboxes. We motivated thenecessity for a specific NAT concept with NAPT functionality,where the verification tags provided by SCTP are used to dis-tinguish between associations. The NAT middleboxes canrequest information from the SCTP end points and give hintsto improve the overall procedure.

Furthermore, several scenarios were analyzed to explain themanipulation of the NAT table in single-homed, multihomed,and mobility environments. The peer-to-peer communicationwith a preregistration was taken into account as well.

Generalizing the SCTP-specific variant of NAT, the follow-ing is important. For supporting a transport protocol withmultipath support, a connection identifier makes connectiontracking possible without a requirement to rely on the port

numbers. This avoids the requirement of changing the portnumbers and possibly synchronizing them between differentNAT middleboxes. A feature of dynamic address reconfigura-tion can be used to avoid having IP addresses in the transportlayer, which is problematic for the processing in NAT middle-boxes. For peer-to-peer communications, it is helpful if thetransport layer supports simultaneous connection setups.Finally, it might be preferable to use simple algorithms involv-ing random numbers with a small chance of collision insteadof more complex deterministic algorithms without collision.

The solution presented in this article will be included in afuture version of our Internet drafts to be considered for stan-dardization in the BEHAVE working group of the IETF.

References[1] Q. Xie et al., “SCTP NAT Traversal Considerations,” draft-xie-behave-sctp-

nat-cons-03.txt (work in progress), Nov. 2007.[2] R. Stewart and M. Tüxen, “Stream Control Transmission Protocol (SCTP) Net-

work Address Translation,” draft-stewart-behave-sctpnat-03.txt (work inprogress), Nov. 2007.

[3] M. Tüxen and R. Stewart, “UDP Encapsulation of SCTP Packets,” draft- tuex-en-sctp-udp-encaps-02.txt (work in progress), Nov. 2007.

[4] P. Srisuresh and M. Holdrege, “IP Network Address Translator (NAT) Termi-nology and Considerations,” RFC 2663, Aug. 1999.

[5] R. Stewart, “Stream Control Transmission Protocol,” RFC 4960, Sept. 2007.[6] R. Stewart et al., “Stream Control Transmission Protocol (SCTP) Dynamic

Address Reconfiguration,” RFC 5061, Sept. 2007.[7] M. Riegel and M. Tüxen, “Mobile SCTP Transport Layer Mobility Manage-

ment for the Internet,” Proc. SoftCOM 2002, Int’l. Conf. Software, Telecom-munications and Computer Networks, Split, Croatia, 2002, pp. 305–09.

[8] B. Ford and P. Srisuresh, “Peer-to-Peer Communication across Network AddressTranslators,” USENIX Annual Technical Conf., Anaheim, CA, Apr. 2005.

[9] R. Stewart and Q. Xie, Stream Control Transmission Protocol (SCTP): A Refer-ence Guide, Addison-Wesley, Oct. 2001.

[10] I. Rüngeler, M. Tüxen, and E. Rathgeb, “Integration of SCTP in theOMNeT++ Simulation Environment,” Int’l. Developers Wksp. OMNeT++(OMNeT++ 2008), Mar. 2008.

BiographiesERWIN P. RATHGEB ([email protected]) received his Dipl.-Ing. andPh.D. degrees in electrical engineering from the University of Stuttgart, Germany,in 1985 and 1991, respectively. He has been a full professor at the UniversityDuisburg-Essen since 1999 and holds the Alfried Krupp von Bohlen und HalbachChair for Computer Networking Technology at the Institute for Experimental Math-ematics. From 1991 to 1998 he held various positions at Bellcore, Bosch Telekom,and Siemens. His current research interests include concepts and protocols fornext-generation Internets with a focus on network security. He is a member of IFIP,GI, and ITG, where he is chairman of the expert group on network security.

IRENE RÜNGELER ([email protected]) received her diplomas in computerscience and economics at the University of Hagen in 1992 and 2000, respec-tively. She joined the Münster University of Applied Sciences in 2002, where sheworks as a research staff member. Her research interests include innovativetransport protocols, especially, SCTP and their performance analysis, signalingtransport over IP-based networks, and fault-tolerant systems.

RANDALL STEWART ([email protected]) works for TRG Holdings aschief development officer. His current duties include integrating software solutionsfor call center applications using both SCTP and RSerPool. Previously, he was adistinguished engineer at Cisco systems. He also has worked for Motorola,NYNEX S&T, Nortel, and AT&T Communications. Throughout his career he hasfocused on operating system development, fault tolerance, and call-control sig-naling protocols. He is also a FreeBSD committer with responsibility for the SCTPreference implementation within FreeBSD.

MICHAEL TÜXEN ([email protected]) studied mathematics at the University ofGöttingen and received a Dipl.Math. degree in 1993 and a Dr.rer.nat. degreein 1996. He has been a professor in the Department of Electrical Engineeringand Computer Science of Münster University of Applied Sciences since 2003. In1997 he joined the Systems Engineering group of ICN WN CS of Siemens AGin Munich. His research interests include innovative transport protocols, especial-ly SCTP, IP-based networks, and highly available systems. At the IETF, he partici-pates in the Signaling Transport, Reliable Server Pooling, and Transport AreaWorking Groups.

TÜXEN LAYOUT 9/5/08 1:00 PM Page 32

Page 32: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

AbstractBecause of the constant reduction of available public network addresses and thenecessity to secure networks, middleboxes such as network address translators andfirewalls have become quite common. Because they are designed around theclient-server paradigm, they break connectivity when protocols based on differentparadigms are used (e.g., VoIP or P2P applications). Centralized solutions for mid-dlebox traversal are not an optimal choice because they introduce bottlenecks andsingle point-of-failures. To overcome these issues, this article presents a distributedconnectivity service solution that integrates relay functionality directly in user nodes.Although the article focuses on applications using the Session Initialization Proto-col, the proposed solution is general and can be extended to other application sce-narios.

33IEEE Network • September/October 2008 0890-8044/08/$25.00 © 2008 IEEE

lthough end-to-end direct connectivity was amust in the early days of the Internet, currently,increasing numbers of hosts are connectedthrough middleboxes such as network address

translators (NATs) that enable the reuse of private address-es and/or firewalls, which are used to secure corporate net-works and internal resources. These devices work seamlesslyin case of client-server applications (although the client mustreside in the “protected” part of the network), but they limitthe end-to-end connectivity of the applications that use dif-ferent paradigms, such as voice over IP (VoIP) and peer-to-peer (P2P). In particular, middleboxes prevent nodes behindthem from being contacted directly from external nodes. Forexample, an internal host might not have a problem startinga data transfer to an external host, but the reverse (e.g., anincoming VoIP call) may be impossible. Thus, proper strate-gies for middlebox traversal are required to enable the seam-less communication between hosts, no matter where they arelocated. Among the known strategies, hole punching andrelaying [1] represent the ones that are used most frequently.The common idea is to make the middlebox function as ifthe internal host begins the communication. The middleboxthen creates a temporary channel with the remote host, thusallowing the delivery of external packets. In particular, thehole punching forces each internal host to maintain a persis-tent connection with an external rendezvous server locatedon the public Internet. This creates a type of “hole” that canbe used by an external host to contact the internal hostdirectly. If hole punching fails, for example, if hosts arebehind symmetric NATs, the relaying represents the lastchance: internal hosts maintain a persistent connection withan external node (the relay server), which operates as a for-warder, that is, it receives all packets directed to the internalhost and redirects them to it. This solution requires that theinternal host advertises the IP address of the relay server asone of its addresses, and that instructs the relay server withthe proper forwarding rules.

This article focuses on the problem of middlebox traversal

for applications using the Session Initialization Protocol (SIP)[2], which is among the protocols that suffers most from mid-dlebox limitations. Two solutions were defined in this context.SIP messages directed to the destination user agent (UA) aredelivered with a relay-based approach that exploits an inter-mediate public SIP proxy [3]. For media flows, the interactivi-ty connectivity establishment (ICE) [4] protocol was proposed.ICE is an integrated solution defined to discover NAT bind-ings and to execute the hole punching for media streams. Inaddition, ICE also supports media relaying based on theTraversal Using Relay around NAT (TURN) [5] protocol.Both the hole-punching mechanism of ICE and TURN relyon simple traversal of UDP through NAT (STUN) [6], aclient-server protocol consisting of two messages, BindingRequest and Binding Response. These messages are sufficientfor implementing the hole-punching procedure [1], whereasTURN must extend the STUN protocol to establish commu-nication channels with relays, called TURN servers. STUN alsocan be used to implement a middlebox behavior discovery ser-vice [7] that can be used by internal hosts to determine thetype of NAT/firewall they are behind.

Current middlebox traversal solutions rely on centralizedservers that provide rendezvous and relay capabilities. Howev-er, the centralized server is a single point of failure: if theserver fails, all UAs behind middleboxes become unreachable.Furthermore, a centralized solution cannot scale to an IP-based telecommunication provider with millions of customers,in which servers may be required to handle a huge amount oftraffic (both SIP signaling messages and media datagrams),thus requiring a large amount of computational resources andbandwidth. The server acting as relay for SIP (i.e., the SIPproxy) also must handle the traffic generated by keep-alivemessages that UAs behind the middlebox periodically send toit. Keep-alive messages are required to maintain the commu-nication channel with the server and thus to guarantee thatthese UAs can always be reached. This could result in a highoverhead. For example, according to the NAT binding time-out reported in [3], in a SIP domain including 1.5 million UAs

AA

Luigi Ciminiera, Guido Marchetto, Fulvio Risso, and Livio Torrero, Politecnico di Torino

Distributed Connectivity Service for a SIP Infrastructure

CIMINIERA LAYOUT 9/5/08 1:03 PM Page 33

Page 33: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200834

with limited connectivity, the central server must handle about50,000 keep-alive messages per second.

This article proposes a distributed architecture — referredto as DIStributed COnnectivity Service (DISCOS) — forensuring connectivity across NATs and firewalls in a SIPinfrastructure. This solution overcomes the limitations of thecurrent centralized solution by creating a gossip-based P2Pnetwork and integrating the previously described rendezvousand relay functionalities in the UAs. Each globally reachableUA with enough resources can provide such services to UAswith limited connectivity. A major emphasis is given to theoverlay design, as it is a key point for ensuring a fast “servicelookup” (i.e., to find a peer that still has enough resources foroffering the connectivity service), which is instrumental forproviding an adequate quality of service to the users. In par-ticular, we show how a scale-free topology can fit this require-ment, and we propose an overlay construction model that canbe used to build such topology.

DISCOS is somewhat orthogonal to P2P-SIP [8], althoughboth are based on P2P technologies. In fact, P2P-SIP is asolution mainly for distributed lookup, whereas DISCOSoffers a solution for middlebox traversal.

The idea of distributing such functionalities among end sys-tems is also one of the characteristics of Skype, a well-knownVoIP application. However, Skype uses secret and proprietaryprotocols that cannot be studied and evaluated by third par-ties, therefore limiting the ability to understand exactly howthese problems are solved. For example, in the Skype analysispresented in [9] and [10], the authors could give only partialexplanations about its NAT and firewall traversal mechanisms.Their experiments pointed out that nodes with enoughresources can become supernodes and provide support forNAT and firewall traversal. In particular, they offer relayfunctionalities and probably run a sort of STUN server thatother nodes use to discover the presence (and to determinethe type) of NAT and firewall in front of them. Therefore, itis clear that a node behind NAT must connect to a supernode to be part of the Skype network, but no informationcould be provided about the super node discovery and selec-tion policies. Also, super node overlay topology is almostcompletely unknown. Thus, there is no way to evaluate theeffectiveness of these solutions. On the other hand, here wepropose a distributed architecture for middlebox traversalwhose scalability and robustness are discussed and evaluated.In addition, the solution was engineered and validated by sim-ulation on a SIP infrastructure, but the solution is more gen-eral, and it can be seen as a mechanism to cope withmiddlebox traversal, thus opening the path to a wider adop-tion.

Operating PrinciplesDistributed Connectivity ServiceDISCOS extends current centralized NAT and firewall traver-sal solutions by distributing rendezvous and relay functionali-ties among UAs. Relaying and hole-punching service formedia flows is implemented by integrating a STUN/TURNserver in each UA. The TURN server also is used to supportrelaying SIP messages. However, DISCOS can be modifiedeasily to offer the relaying of SIP messages by integrating SIPproxy functionalities in each UA, leading to a distributedimplementation of [3].

A UA with enough resources (e.g., a public networkaddress, a wideband Internet connection, and free CPUcycles) becomes what we define as a connectivity peer andstarts to offer a connectivity service. In particular, connectivitypeers can act as both SIP relay (leveraged by UAs with limit-

ed connectivity for receiving SIP messages) and media relay.Connectivity peers also can offer support to the hole-punchingprocedure for media session establishment, thus operating asa distributed rendezvous server. In addition, connectivitypeers also provide support for middlebox behavior discovery[7]. UAs with limited connectivity can locate and attach to anavailable peer whenever they require one of these services.

Connectivity peers are organized in a P2P overlay, andtheir knowledge is spread through proper advertisement mes-sages, thus building an unstructured gossip-based network.Structured networks, characterized by additional overheaddue to the maintenance of the structure, are not consideredbecause their excellent lookup properties are not required. Infact, DISCOS uses the overlay to find only the first availableconnectivity peer and not for locating a precise resource.

Note that because DISCOS distributes existing middleboxtraversal functionalities among peers, it is also totally compati-ble with current middleboxes and their traversal techniques.This enables a smooth deployment of the proposed solution.

Overlay TopologyIn order to enable DISCOS to locate an available peer forUAs with limited connectivity in the shortest time possible,peers should have a deep knowledge of the network: thegreater the number of known peers, the higher the probabilityof finding an available peer in a short time, especially ifknown peers are lightly loaded. In gossip-based networks, thespread of information is based on flooding, thus the overlaytopology has a deep impact on the network efficiency. Forinstance, the greater the average path length between nodes,the higher the depth of the flooding (hence the load on thenetwork) that is required for an adequate spread of the infor-mation. Thus, an overlay topology that ensures a small aver-age path length is required. However, this is not sufficient forenabling peers to know a large set of suitable connectivitypeers from which to choose when a UA asks for the connec-tivity service. In fact, nodes maintain a cache that should bekept small to reduce the overhead required to manage all theentries. This limits the number of peers known at each instant.The limited cache size can be compensated by frequentlyrefreshing its contents so that the set of known peers changesfrequently, resulting in a sort of round robin among peers: dif-ferent connectivity peers can always be provided to UAs thatrequest the service at different instants, thus increasing theopportunity for a queried connectivity peer to suggest avail-able ones when it cannot provide the service itself. Frequentcache refresh also is useful for ensuring that nodes store up-to-date information about existing peers. Such a policy can beefficiently adopted if the overlay results in a scale-free network[11], an interesting topology that ensures small average pathlength and features scalability and robustness. In a scale-freenetwork, few nodes (referred to in the following as hubs) havea high degree, whereas the others have a low one. The degreeof a node is the sum of all its incoming (i.e., the in-degree)and outgoing (i.e., the out-degree) links. In the DISCOS over-lay, the out-degree of a node is limited by the cache sizewhereas the in-degree is the number of other peers that havethat node in their cache. Thus, nodes can be considered hubswhen they are in the cache of several peers, that is, when theyare highly popular. Hubs frequently receive advertisementmessages from a large set of different nodes, so they frequent-ly update their cache. In particular, if advertisement messagescontain nodes that are low in popularity, hubs can discoverpeers, which being low in popularity, are lightly loaded withhigh probability. The key is to make searches through hubsbecause they potentially know a large variety of lightly loadedpeers. Thus, the proposed solution essentially exploits — and

CIMINIERA LAYOUT 9/5/08 1:03 PM Page 34

Page 34: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 35

generalizes to the case of a single resource provided by manynodes — the results achieved by Adamic et al. [12] about ran-dom walk searches in unstructured P2P overlays. They demon-strated that searches in scale-free networks are extremelyscalable (their cost grows sublinearly with the size of the net-work), also proving that searches toward hubs perform betterthan random searches because hubs have pointers to a largernumber of resources. In DISCOS, the benefit of searchingthrough hubs comes from the high frequency with whichpointers to connectivity peers change in their cache. Theseproperties are obtained at the expense of a non-uniform dis-tribution of the number of messages handled by nodes: thehigher the popularity of a node, the larger the number ofadvertisement messages received. However, a proper hubselection policy and a reasonable advertisement rate couldmitigate the effects of this disparity. These aspects are ana-lyzed in more detail in the following section.

The Barabasi-Albert [11] model was proposed to createscale-free graphs. In this model, few nodes are immediatelyavailable and when a new node arrives, it connects to one ofthe existing nodes with a probability that is proportional tothe degree of popularity of such a node (preferential attach-ment); in other words, the model assumes a global knowledgeof nodes and their degree, which is clearly inapplicable in areal network scenario. A first step to implement such a modelin our overlay is to make M peers available to other nodesthrough a bootstrap service. When a node joins the overlayfor the first time, it queries the bootstrap service for a subsetof these M registered nodes. However, preferential attach-ment is not possible with the mechanism described so farbecause all incoming peers:• Can learn only the nodes provided by the bootstrap service• Cannot compute the popularity of a nodeAn adequate spread of the network knowledge can addressthe first issue, but there is no way to enable a node to learnthe in-degree (i.e., the precise metric of node popularity) ofthe others. In our case, the popularity is computed

autonomously by each node through a simple approximatedmetric based on the number of received advertisement mes-sages that contain such a node. In our approximated model,preferential attachment is implemented by forcing peers toevaluate the popularity of nodes through the previously men-tioned mechanism and then to include some of the most pop-ular peers in the advertisement messages they send. Thisallows nodes to insert highly popular peers (hubs) in theircache, thus building and maintaining the scale-free topology.In summary, new nodes use the peers known through thebootstrap service as “bootstrap” nodes; then they learn themost popular ones through the received advertisement mes-sages and start to perform preferential attachment. Further-more, incoming nodes that already know peers discoveredduring their previous visits can avoid the bootstrap procedureby attaching directly to them. The resulting topology is shownin Fig. 1.

It is worth noting that different bootstrap services can beused to create disjoint overlays because joining peers thatfetch nodes from different bootstrap services start to exchangeadvertisement messages with different connectivity peers. Thisenables the possibility of deploying different DISCOS overlaysin different geographical areas of a SIP domain. If a location-aware bootstrap service selection policy is adopted, users canfind a connectivity peer that is close to them, thus preservingthe user-relay latency achieved by current centralized solu-tions, where different servers can be used at different loca-tions.

The implementation of the bootstrap service is highly cus-tomizable. A possible solution consists in deploying M staticpeers and preconfiguring their addresses on each UA. A moreflexible approach (considered in the following) consists indeploying multiple bootstrap servers reachable through appro-priate domain name server service (DNS SRV) locationentries configured in the DNS. Each bootstrap server storesinformation about M connectivity peers that spontaneouslyregister themselves when they join the overlay. Multiple boot-

n Figure 1. DISCOS overlay topology.

= Highly popular connectivity peer (hub)

The UA behind NAT queries anode (possibly a hub) for service

A joining connectivity peer with no entries in cachequeries the bootstrap service for some hubs

CP

CP

CP

CP

CP

CP

CP

= Connectivity peerCP

CP

CP

CPCP

CP

CP

CP

CP

CP

CP

NAT

UA A

Bootstrapservice

CIMINIERA LAYOUT 9/5/08 1:03 PM Page 35

Page 35: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200836

strap servers are deployed for redundancy and load balancingpurposes. Proper DNS configuration can enable a location-aware bootstrap service selection.

Protocol OverviewWhenever a UA joins the SIP domain, it must determine if itcan become a connectivity peer, or if it is behind a middlebox.This is done by contacting a connectivity peer and exploitingits STUN functionalities [7]. The described bootstrap proce-dure is performed if it does not know an active peer. The flowchart related to the join procedure is shown in Fig. 2a.

If the UA can become a connectivity peer, it checks thenumber of addresses registered on each bootstrap server andif it is smaller than a fixed bound M, it adds itself to the list.

Then, it sends an advertisement message to the known peersto announce itself. The UA is now part of the DISCOS over-lay, and it starts receiving messages from other nodes, thusgradually filling its cache with new peers. A proper peeradvertisement policy is adopted to implement preferentialattachment (thus building and maintaining the scale-freetopology) and to enable caches to be refreshed with lightlyloaded peers (thus having potential nodes available for theservice). In particular, advertisement messages include thesender node, the two most popular peers it knows (enablingpreferential attachment), and the two less popular peers itknows (spreading the knowledge of lightly loaded peers).

Advertisement messages are periodically sent by peers toall nodes they have in their cache and contain a special time-

n Figure 2. Operation of DISCOS when: a) a node joins the SIP domain; b) a node in the overlay receives an advertisement message; c)a node performs a SIP/media relay lookup.

Responsewithin atimeout?

No

No

Yes

Yes

Contact the twoless popular

included in theresponse

Order the cacheby popularity

Put the mostpopular in cache

(drop lesspopular if full)

Is it availablefor SIP/media relay

service?

Is it availablefor SIP/media relay

service?

Responsewithin atimeout?

Contact themost popularnot yet visited

Get the threepeers providedin the response

Insert newpeer

Order the cacheby popularity

Response withina timeout?

Has nodelimited connectivity?

Contact oneusing STUN

Perform other STUNtests with the

contacted peer

Is the cacheempty?

No

No

No

Yes

Yes

Yes

Start

Join DISCOSoverlay

SIP relaylookup

Stop(timeout if SIPrelay lookup)

(a)

Fetch peers frombootstrap

service

Is the peeralready in

cache?

Is thecache full?

Extract one

Increase peerpopularity

Drop peer inaverage position

Are there peersin the advertisement?

Yes

No

NoYes

No

Yes

Start

Start

(b)

Are therepeers not yet

visited?

Fetch peersfrom bootstrap

service

Is the cacheempty?

No

No

No

Yes

Yes

Yes

Yes

No

Start

(c)

Success

Success

CIMINIERA LAYOUT 9/5/08 1:03 PM Page 36

Page 36: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 37

to-live (TTL) field that allows the message to cross N hops: assoon as the message is received, the TTL value is decrement-ed and if it is a positive value, the recipient sends anothermessage to all the nodes in its cache. Every time a peerreceives an advertisement message, it updates its cache byincreasing the popularity of nodes already present and byinserting the new ones. As previously described, it is impor-tant for a node to have both hubs and peers of low popularityin its cache. Thus, a proper cache management policy also isadopted if the cache is full: the node with average popularityis removed before the insertion, resulting in a cache that privi-leges big hubs and peers of low popularity. Figure 2b detailsthe operations of a peer when it receives an advertisementmessage.

UAs with limited connectivity have a different behaviorbecause essentially they exploit DISCOS features to find SIPrelays (they choose a connectivity peer as relay for SIP mes-sages as soon as they join the SIP domain; in addition, theyselect another when the current one disappears) and mediarelays (when they need one to establish a media session). AUA with limited connectivity performs these lookups by con-tacting the most popular peers in its list, which can accept ordecline the request. If it refuses, it includes in the answer thetwo least popular peers and the most popular peer it knows:the least popular peers are queried immediately (since theyare supposed to be free enough to provide connectivity),whereas the most popular is inserted in the cache (because itcan perform faster searches as it is probably a hub). If bothqueried peers refuse to provide the service, another node ispicked from the cache, and the procedure is repeated. If allthe nodes in the cache were queried without success, two dif-ferent policies are applied, depending on the type of servicethe UAs with limited connectivity require: in the case oflookup for a SIP relay, the UA waits for a random time andthen repeats the procedure; in the case of lookup for a mediarelay, the procedure is stopped, and the media session cannotbe established. Relay lookup procedure is shown in Fig. 2c.

UAs with limited connectivity also receive ad hoc messagesfrom their relays containing three highly popular peers thatallow them first to fill, and then to update, their cache withnew hubs. This enables them to direct searches toward hubswhen they require a connectivity peer. Broken hubs (e.g.,because of a network failure) are detected through a timeout:if a hub does not reply to a query, the UA can query one ofthe others hubs in its cache. If no peers are available, the UAagain fetches the registered ones from the bootstrap server;however, this situation is unlikely to occur because UAs withlimited connectivity periodically receive new hubs from theirSIP relays.

This protocol could be integrated in SIP, as well as imple-mented separately. The former approach is more straightfor-ward as it simply consists in defining new SIP header fields.The latter one is more efficient, especially concerning themessage size. In fact, the human-readable nature of SIP mes-sages would result in advertisement messages of about 800bytes.

Security IssuesThe deployment of a P2P architecture for providing connec-tivity service raises several security issues that are differentthan in centralized solutions. In DISCOS, like in many otherdistributed systems, the control of the consequences of mali-cious behavior of nodes can be more difficult than in the cen-tralized counterpart. Much effort has been expended duringpast years in investigating these issues in the context of P2P-SIP overlays [8, 13, 14] that must deal with similar concerns asthey replace centralized SIP proxies for user locations. Some

solutions were proposed and can be seamlessly applied inDISCOS. For example, in [14], public key certificates are dis-tributed among users to enable them to verify the origin andthe integrity of messages. Analogously, certificates can beused in DISCOS to authenticate advertisement messages, sothat they can be considered trusted. This limits the operationof malicious peers as they can be easily traceable. This andother P2P-SIP derived security policies certainly require fur-ther improvement to better fit specific DISCOS requirements.However, we are confident that effective results can beobtained with minimal modifications because, as mentionedpreviously, the security issues that must be addressed are simi-lar in the two environments. This additional effort is left forfuture work.

Overlay SimulationSimulations BackgroundWe developed a custom, event-driven simulator to evaluatethe effectiveness of the proposed solution. In particular, wewere interested in proving its scalability and validating itsalgorithms. Thus, we implemented a simulator supporting thefollowing four operations: node arrival/departure, media ses-sion set up/teardown, SIP relay lookup (triggered when anode with limited connectivity joins the network or when itscurrent SIP relay disappears), and media relay lookup (thatoccurs when a node requires a relay to perform a media ses-sion).

Simulations are referred to a single SIP domain. Nodearrivals and call occurrences are modeled using a Poisson pro-cess, whereas node lifetime and call length are extracted fromreal Skype traffic coming from/to the network of the universi-ty campus to approximate the behavior of real VoIP networks.With our parameters, the average number of nodes in the net-work depends on their arrival rate because of the effect of thePoisson arrivals model coupled with the lifetime distributionof Skype. For example, an arrival rate λN = 100 nodes/minuteleads to a network consisting, on average, of 30,000 nodes,which is the standard size in our simulation and is a goodtrade-off between simulation length (some lasting several dayson a Dual Xeon 3 GHz processor) and significance of results.To test our solution within different traffic load scenarios,three different rates are used for media session occurrences:1.4 λN, 5 λN, and 20 λN sessions/minute. These values, coupledwith the distribution of the Skype call duration, lead to 10percent, 30 percent, and 98 percent of nodes simultaneouslyinvolved in a media session, respectively.

Statistics presented in [15] show that about 74 percent ofhosts are behind a NAT. In addition, [1] shows that holepunching is successful in about 82 percent of the cases. To thebest of our knowledge, no detailed information is availableabout firewall proliferation over the Internet. On the strengthof these available data, we consider for simulation a networkscenario where nodes have limited connectivity with probabili-ty PLC = 0.74 and media sessions directed to these nodesrequire relaying with probability PMR = 0.18. Whenever anode joins the SIP domain, two different actions can be per-formed at simulation level: if it is tagged as a node with limit-ed connectivity (with probability PLC), it triggers a SIP relaylookup; otherwise it joins the DISCOS overlay as a connectivi-ty peer. Media sessions are possible between each pair ofnodes (selected randomly). When a node behind a NAT iscontacted, a media relay lookup is triggered by this node withprobability PMR.

The number of UAs with limited connectivity to which apeer can simultaneously provide SIP relay service is set to 10;advertisement messages have a TTL equal to 2; their sending

CIMINIERA LAYOUT 9/5/08 1:03 PM Page 37

Page 37: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200838

interval is set to 60 minutes; and the cache of a peer is sup-posed to contain 10 entries. Furthermore, the number ofpeers registered in the bootstrap server (which is supposed tobe unique and reachable by nodes) is set to 20. Simulationlasts enough to exit from the transient period; presentedresults are referred to the steady state.

Overlay Topology EvaluationFirst, simulation aims at demonstrating that our protocol cre-ates a scale-free network among connectivity peers. In partic-ular, we consider the clustering coefficient and the in-degreeof nodes [11]. The clustering coefficient of a node is definedas the number of links between its neighboring nodes dividedby the number of links that could possibly exist between them.To be scale-free, an overlay must have an average clustering

coefficient higher than the one of a random graph obtained inthe same conditions, which is clearly proved in Fig. 3a. Indetail, the average clustering coefficient for DISCOS decreas-es when the network size grows, asymptotically converging toa value that is about 20 times the clustering coefficient of arandom graph. We also verified that at all network sizesexperimented, the coefficient remains almost constant in time.Concerning the in-degree, the requirement to be met is thatthe distribution of node degree follows a power law, wherethe probability is that a node has k connections, and c is anormalization factor. Figure 3b shows that the distribution ofin-degree values obtained through simulation fits well a powerlaw P(k) = ck–γ with c = 0.7 and γ = 1.5. These tests validateour overlay construction model, showing that the resultingtopology really evolves in a scale-free network.

n Figure 3. Simulations results: a) average clustering coefficient evaluation; b) in-degree power law distribution; c) average number ofcontacted peers to find a SIP relay; d) media session failure probability vs. number of allocated backup relays; e) average number ofpeers contacted to allocate K relays; f) bandwidth consumption distribution.

Network size (nodes)(a)

5000

0,1

Ave

rage

clu

ster

ing

coef

ficie

nt

0

0,2

0,3

0,4

0,5

0 10,000 15,000 20,000 25,000 30,000In-degree

(b)

0,001

Frac

tion

of

node

s

0,0001

0,01

0,1

1

1 10 100

Network size (nodes)(c)

5000

4

Con

tact

ed n

odes

0

2

6

8

10

12

0 10000 15000 20000 25000 30000Number of backup relays

(d)

0,004

Failu

re p

roba

bilit

y

0

0,003

0,002

0,001

0,005

0,006

0,007

0 1 2 3

Number of allocated relays(e)

4Con

tact

ed n

odes

0

2

6

8

10

14

12

1 2 3Number of media flows per relay

(e)

0,6

Frac

tion

of

node

s

0

0,4

0,2

0,8

1

0 1 2 3

DISCOS overlayRandom spread and lookup

10% involved in a call30% involved in a call98% involved in a call

10% involved in a call30% involved in a call98% involved in a call

10% involved in a call30% involved in a call98% involved in a call

DISCOS observationsPower law, c=0.7, y=1.5

DISCOS overlayRandom graph

CIMINIERA LAYOUT 9/5/08 1:03 PM Page 38

Page 38: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 39

To prove the effectiveness of the DISCOS topology, wecompare our solution with a distributed system where theinformation is randomly spread, and the nodes to query dur-ing lookup procedures are randomly chosen among peers inthe cache. Figure 3c depicts the average number of peers thatmust be contacted to reach an available SIP relay for bothDISCOS and the randomized overlay. Although the advertise-ment rate and the TTL value remain the same, the figureshows that in DISCOS, the number of peers contacted is sen-sibly lower. Furthermore, the ratio between the performancesobtained by the two policies increases with the network size,thus demonstrating the scalability properties of our solution.

These tests prove the effectiveness and the scalability ofDISCOS. In particular, results show how the scale-freetopology ensures overlay efficiency with a limited messagerate (each peer sends an advertisement message every 60minutes) with a small TTL (equal to 2) and a limited cachesize (10 entries). We also evaluated the number of adver-tisement messages that connectivity peers must handle inour simulated SIP domain including 30,000 UAs: 99 percentof nodes process less than seven advertisement messagesper minute and the remaining 1 percent process a numberof messages that varies between eight and 48 messages perminute, thus resulting in a reduced per-node overhead.However, this confirms that hubs should be chosen careful-ly, with a preference for nodes with enough computationaland bandwidth resources, for example, using the dynamicprotocol proposed by Chawathe et al. for the Gia P2P net-work [16].

Media Sessions Relaying PerformanceThis section analyzes the overlay support for media sessions,in particular when hole punching fails and relaying is required.To prevent resource wasting, a media relay is typically chosenby a UA immediately preceding the establishment of a mediasession. Various types of media flows are considered, differingin the amount of consumed bandwidth. In particular, assum-ing b bit/s is the consumed bandwidth unit, five types of flowsrequiring nb (1 ≤ n ≤ 5) bit/s are defined. The flow type is ran-domly selected (with uniform distribution) when a new sessionstarts. We also define Bi as the amount of bandwidth thatpeer i can offer for relaying media sessions. For the sake ofsimplicity, Bi is assumed to be the same for each connectivitypeer and equal to 5b bit/s. However, in a real scenario, thisvalue could vary according to node capabilities.

We start the evaluation of the DISCOS support for mediasessions from the estimation of the call failure probabilitybecause it is the parameter that mainly affects the quality ofservice perceived by users. A session can fail because eitheran available relay cannot be found, or the relay is found butbecomes unavailable during the session (e.g., because it dis-connects from the network).

With respect to the first problem, we never observed suchan event during simulation: a UA with limited connectivitywas always able to find a media relay. This result suggests thatwith our assumptions about the number of media sessionsrequiring a relay, the probability for this event to occur in aDISCOS environment can be considered negligible. The sec-ond issue could be mitigated by implementing proper relayback-up policies. As shown in Fig. 3d, the media session canfail in about 0.6–0.65 percent of cases, but the selection of asingle back-up relay (that handles the communication in casethe first relay fails) sensibly reduces this probability, and fur-ther reductions are possible increasing the number of relaynodes. The blocking probability remains low even in theunlikely case in which 98 percent of the users are involved ina call (i.e., almost all users are at the phone). The overhead

deriving from the search of back-up relays is depicted in Fig.3e, which plots the average number of peers that must be con-tacted to find K available media relays. For a reasonable num-ber of simultaneous sessions, this value remains low. However,we set the number of back-up relay nodes to one, which is areasonable trade-off between the probability of a session dropand the additional complexity that results when a UA mustsearch a back-up relay node before starting media sessions.

Finally, we analyzed the distribution of load among connec-tivity peers. In particular, Fig. 3f shows the distribution of thenumber of media flows simultaneously handled by mediarelays. It can be observed that although media flows have dif-ferent bandwidth requirements, the great part of relays simul-taneously handles no more than one media session. Thus, agood load balancing among peers is guaranteed.

ConclusionsThis article presents a distributed infrastructure, called DIS-COS that aims at providing connectivity service to hostsbehind middleboxes. This solution extends current centralizedapproaches (and overcomes their scalability and robustnesslimitations) by integrating middlebox traversal functionalitiesinto edge nodes. The article also presents the mechanismsthat can be used to manage such infrastructure and exploit itsservices. The proposed infrastructure is based on an unstruc-tured peer-to-peer paradigm and proved to be extremelyeffective in locating suitable relays and distributing media ses-sions evenly among the available connectivity peers. Resultsconfirm that the overhead for managing the overlay is low,that each host is able to locate a suitable connectivity peerwith a small number of messages (hence, in a very short time),and the blocking probability of a new media call is negligibleeven for a very high load. Although our simulations cannotsimulate a nationwide network (for processing/memory prob-lems), we are confident that results can be extended to suchan environment because the distributed infrastructure is basedon the scale-free topology, which is the key to achieving theseresults ensuring overlay scalability and robustness.

Future work aims to validate the proposed infrastructure innon-SIP environments and more exhaustively address securityissues.

AcknowledgmentThe authors would like to thank Marco Mellia who wasinstrumental in obtaining a proper characterization of Skypeuser agents.

References[1] B. Ford, P. Srisuresh, and D. Kegel, “Peer-to-Peer Communication across

Network Address Translators,” USENIX Annual Tech. Conf., Anaheim, CA,Apr. 2005.

[2] J. Rosenberg et al., “SIP: Session Initiation Protocol,” IETF RFC 3261, June2002.

[3] C. Jennings and R. Mahy, Eds., “Managing Client Initiated Connections inSIP,” http://tools.ietf.org/html/draft-ietf-sip-outbound-11, Nov. 2007.

[4] J. Rosenberg, “Interactive Connectivity Establishment (ICE): A Protocol forNAT Traversal for Offer/Answer Protocols,” http://tools.ietf.org/html/draft-ietf-mmusic-ice-18, Mar. 2008.

[5] J. Rosenberg, R. Mahy, and P. Matthews, “Traversal Using Relays aroundNAT (TURN): Relay Extensions to Session Traversal Utilities for NAT (STUN),”http://www3.tools.ietf.org/html/draft-ietf-behave-turn-07, Feb. 2008.

[6] J. Rosenberg et al., “Session Traversal Uti l i t ies for (NAT) (STUN),”http://tools.ietf.org/html/draft-ietf-behave-rfc3489bis-15, Feb. 2008.

[7] D. MacDonald and B. Lowekamp, “NAT Behavior Discovery Using STUN”;http://www3.tools.ietf.org/html/draft-ietf-behave-nat-behavior-discovery-03,Feb. 2008.

[8] D. A. Bryan and B. B. Lowekamp, “Decentralizing SIP,” ACM Queue, vol. 5,no. 2, Mar. 2007.

[9] S. A. Baset and H. Schulzrinne, “An Analysis of the Skype Peer-to-Peer Inter-net Telephony Protocol,” IEEE INFOCOM ’06, Barcelona, Spain, Apr. 2006.

CIMINIERA LAYOUT 9/5/08 1:03 PM Page 39

Page 39: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

[10] P. Biondi and F. Desclaux, “Silver Needle in the Skype,” Black Hat Europe2006, Amsterdam, The Netherlands, Mar. 2006.

[11] R. Albert and A.-L. Barabási, “Statistical Mechanics of Complex Networks,”Rev. Modern Physics, 74, 2002, pp. 47–97.

[12] L. A. Adamic et al., “Search in Power Law Networks,” Physical Rev., E 64,2001.

[13] J. Seedorf, “Security Challenges for Peer-to-Peer SIP,” IEEE Network, vol.20, no. 5, Sept. 2006.

[14] C. Jennings et al., “Resource Location and Discovery (RELOAD)”;http://www.p2psip.org/drafts/draft-bryan-p2psip-reload-04.txt, June 2008.

[15] M. Casado and M. J. Freedman, “Peering through the Shroud: The Effect ofEdge Opacity on IP-Based Client Identification,” USENIX/ACM Int’l. Symp.Networked Sys. Design and Implementation, Cambridge, MA, Apr. 2007.

[16] Y. Chawathe et al., “Making Gnutella-Like P2P Systems Scalable,” ACMSIGCOMM ’03, Karlsruhe, Germany, Aug. 2003.

BiographiesLUIGI CIMINIERA ([email protected]) [M] is a professor of computer engineer-ing in the Dipartimento di Automatica e Informatica at Politecnico di Torino,Italy. His research interests include grids and peer-to-peer networks, distributedsoftware systems, and computer arithmetic. He is a co-author of two internationalbooks and more than 100 contributions published in technical journals and con-ference proceedings.

GUIDO MARCHETTO ([email protected]) received his Ph.D. in computerengineering in April 2008 and his laurea degree in telecommunications engi-neering in April 2004, both from Politecnico di Torino. He is a post-doctoral fel-low in the Department of Control and Computer Engineering at the Politecnico diTorino. His research topics are packet scheduling and quality of service in pack-et-switched networks, peer-to-peer technologies, and voice over IP protocols. Hisinterests include network protocols and network architectures.

FULVIO RISSO ([email protected]) received his Ph.D. in computer and systemengineering from Politecnico di Torino in 2000 with a dissertation on quality ofservice in packet-switched networks. He is an assistant professor in the Depart-ment of Control and Computer Engineering of Politecnico di Torino. His currentresearch activity focuses on efficient packet processing, network analysis, net-work monitoring, and peer-to-peer overlays. He is the author of several paperson quality of service, packet processing, network monitoring, and IPv6.

LIVIO TORRERO ([email protected]) is a Ph.D. student in computer and systemengineering in the Department of Control and Computer Engineering at Politecni-co di Torino. He received his laurea degree in computer engineering fromPolitecnico di Torino in November 2004. His research topics include voice overIP protocols, IPv6, and peer-to-peer technologies and their NAT/firewall relatedissues.

40 IEEE Network • September/October 2008

CIMINIERA LAYOUT 9/5/08 1:03 PM Page 40

Page 40: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

41IEEE Network • September/October 2008 0890-8044/08/$25.00 © 2008 IEEE

major trend evident in mobile communicationsystems today is the use of multiple network-enabled terminal devices to deliver informationto a single user. These devices can be equipped

with multiple network interfaces, of which some, all, or nonemay be satisfactorily operational at a given point in time. Thistrend toward multiple devices, which we refer to as “user mul-tihoming,” combined with what is commonly termed “devicemultihoming,” introduces the potential for multiple end-to-end paths between two mobile communicating parties. Thesealternate paths can be facilitated with the support of special-ized intermediaries, called middleboxes or service proxies toeither establish or maintain a communication session. For thepurposes of our discussion here, the common terms middle-box, intermediary, and our own term — service proxy (or SP)— are synonymous.

Although some of the aspects of user multihoming willdiminish with the next generation of devices that will provideintegrated functionality, the availability of different paths maybe useful for different purposes or at different times. There-fore, multihoming can be exploited to maximize the serviceofferings to mobile users, for example, to take advantage of amore cost-effective network interface available on anotherdevice by re-routing ongoing communications sessions throughthat device.

One of the primary requirements for exploitation of thisnetwork/path diversity is the ability to seamlessly switch com-munication sessions between different networks/paths andpotentially add or remove intermediaries in the process.Although, there have been numerous proposals fornetwork/path switching, such as [1–3], they do not take advan-tage of intermediaries, namely, middlebox devices. One majoradvantage of being able to utilize intermediaries is that theymay be able to perform intensive mobility handling operationssuch as content adaptation or time-shifting at the edges of thecore networks, thus saving last-hop bandwidth and terminaldevice processor/power capacity. Additionally, intermediariescan be used to provide indirect connectivity to the core net-work at times when no direct connectivity is possible.

This article analyzes the requirements for exploiting the oppor-tunities that arise as a result of user and device multihoming inmobile communications systems. It then proposes a solution thatenables the use of intermediaries to redirect or transform dataflows so that they can be received by the best available terminaldevice, or combination thereof, at any given time.

The article presents a vision for mobility management thatcaptures the potential for mobility support using middleboxesand then describes an approach to solving one of the funda-mental requirements to realize this vision. The following sec-tions elaborate on the details of a proposed scheme,concluding with a summary and review of future work.

OverviewThe conceptual foundation on which this article is based isthat users are served by a PN comprising loose, dynamic con-glomerations of many devices focused on serving a particularuser. PNs, as depicted in Fig. 1, encompass personal area net-works (PANs) [4] and may incorporate other terminal andnon-terminal devices that belong to public infrastructure ordevices at distant locations in the network.

In a PN, terminal devices (TDs) terminate an end-to-endconnection on behalf of a user to either display or store theinformation received via the connection. In Fig. 1 the mobilephone and laptop computer, as well as the large display, allrepresent potential terminal devices. The non-terminal enti-ties (middleboxes) act as an intermediary relay point for anend-to-end connection. These entities intercept information intransit from one end of an end-to-end connection, possiblyprocess, and then forward toward the other end of the con-nection. The processing provides either high-level adaptation,filtration, and transformation of application data, or low-levelconnectivity provision. Figure 1 presents four examples ofnon-terminal entities. The mobile phone and laptop computerare able to serve as bridges between various access technolo-gies. The third and fourth examples of service proxy devicesare a remotely located application-aware general packet radioservice (GPRS) gateway and a mobile router.

Stephen Herborn and Aruna Seneviratne, NICTA

AbstractUsers can be served by multiple network-enabled terminal devices, each of whichin turn can have multiple network interfaces. This multihoming at both the user anddevice level presents new opportunities for mobility handling. Mobility can be han-dled by utilizing devices, namely, middleboxes that can provide intermediary rout-ing or adaptation services. This article presents an approach to enabling this kindof mobility handling using the concept of personal networks (PNs). Personal net-works (PNs) consist of dynamic conglomerations of terminal and middlebox devicestasked to facilitate the delivery of information to and from a single human user.This concept creates the potential to view mobility handling as a path selectionproblem because there may be multiple valid terminal device and middlebox con-figurations that can successfully carry a given communication session. We presentdetails and an evaluation of our approach, based on an extension of the HostIdentity Protocol, which demonstrate its simplicity and effectiveness.

Dial “M” for Middlebox Managed Mobility

AA

SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 41

Page 41: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200842

Managing Personal NetworksOne of the primary management functions for PNs is theselection of service paths. Consider a single unicast communi-cation session between two sets of terminal devices belongingto PN A and PN B in a content distribution application.Assume that certain parts of the requested content is availablein multiple and different forms on different servers, for exam-ple, advertisements can be served from a different location tothe rest of the content. Non-terminal entities can be used ifrequired and also can be composed to form an aggregatedend-to-end service. The result is a number of valid end-to-endpaths that can be used to carry the communication session, asshown in Fig. 2. To facilitate this, at least one candidate pathfrom one set of terminal devices to the other must be discov-ered, selected, and configured. In the simplest case, the bestpath is one directly between two terminal devices. In othercases, it may involve some non-terminal intermediary. To real-ize this, it is necessary to develop mechanisms for:• Discovering and constructing end-to-end paths• Dynamically configuring and utilizing these paths

The first requires a means to discover and select appropri-ate intermediaries that are distributed throughout a networkand compose an end-to-end path. Such mechanisms arebeyond the scope of this article but are addressed in numer-ous existing works including [5]. The second requires ameans to switch ongoing communication sessions betweenterminal devices and transparently insert or remove interme-diaries into the end-to-end path. This section focuses on thesecond issue.

For the second issue there have been numerous attempts todevelop mechanisms to dynamically discover, construct, andconfigure end-to-end paths, for example, [6] and [7]. Howev-er, the deployment of these schemes is predicated on develop-ment and large-scale deployment of proprietary systemcomponents. The contribution this article makes is to showthat it is possible achieve the desired functionality throughminor modification to a pre-existing protocol. Specifically weshow that it is possible to extend the mobility and multihom-ing capability present in the Internet Engineering Task Force(IETF) Host Identity Protocol (HIP) [8].

The proposal extends the IETF HIP with the capability formovement of communication sessions between terminaldevices, as well as the transparent insertion and removal ofintermediaries (middleboxes), while retaining ultimate controlat the terminal devices on either side of an end-to-end con-nection through the use of a central functional building blockwe call “identity delegation.” This is explained in furtherdetail in the following section.

Identity DelegationThe overwhelming majority of user-level applications thatrequire network connectivity do so by creating a socket boundto some local interface identifier (i.e., IP address) and possi-bly statically connected to a peer identifier. End-to-end con-nections based on sockets cannot cope well with changes inidentifiers (i.e., IP addresses) that occur as a result of mobili-ty. This can be solved at the middleware level by providingapplications with “bind-able” identifiers that can be assignedon a per-flow basis and delegated between physical devices.This enables applications on both ends of a communicationsession to remain oblivious to mobility management activitiesperformed at the operating system level or by mobility han-dling middleware. To transfer sessions, first there must beassurance that devices within a given PN are able and willingto accept any incoming data stream that is redirected to them.This is not the case for a number of reasons. First, no deviceblindly accepts an unsolicited, incoming data stream. Second,the redirected stream may be forwarded toward a port on thenew device that is already occupied by another application.Finally, even if the device can accept the redirected stream, itis not aware of the application to which the data should bedelivered. The use of cryptographic identifiers provides themeans to solve at least the first issue by decoupling networklocators from identifiers and by providing strong authentica-tion of the identities used to send, receive, and forward data.Port conflicts can be solved with stateful connection manage-ment similar to the approach used by network address transla-tors (NATs). What is missing is a way to delegate identifiersbetween devices, and a solution for this problem is proposedhere and described in detail later.

The capacity for identity delegation makes possible a num-ber of interesting application scenarios. The scenarios consid-ered here are identity delegation toward a single and anarbitrary number of intermediate SPs. Figure 3 and Fig. 4illustrate how this could be achieved with the proposedscheme using HIP and IPSec. Background on HIP and itsrelationship to IPSec is provided in [8].

Identity Delegation Toward Terminal DevicesA clear, potential application for host identity delegation is

to enable single intermediary hosts to be inserted dynamicallyor removed from an end-to-end session. This occurs transpar-ently to the transport and higher layers so it does not breakTransmission Control Protocol (TCP) connections. Figure 3provides a generic step-by-step illustration of the process ofinserting an intermediary, SPA in between two terminaldevices TDB and TDA. This process starts with TDA or TDB

n Figure 1. A personal network.

Personal network

Public Internet Personal area network Local area network

GPRS

Wi-Fi

Bluetooth

Ethernet

SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 42

Page 42: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 43

initiating the action. In the example, TDA is the initiator asshown in Fig. 3c.

Identity Delegation Toward Multiple SPsIdentity delegation assists service composition by allowing twohosts engaged in a communication session, TDA and TDB, todelegate their identity to the head and tail of the composedSP chain, SP1 and SP2. This enables an arbitrary AA numberof intermediate SPs to be inserted in between the head andthe tail transparently to TDA or TDB. Figure 4 follows Fig. 3and depicts this usage case as a sequence of twelve consecu-tive steps.

Since application data streams can consist of several sepa-rable atomic components that can be routed independently,for example, audio and video, and because some intermediarySPs can split or join certain application data flows, it is possi-ble to construct an end-to-end SP path that is composed oftwo or more converging subpaths. The benefits provided bysplitting and joining media include the potential for selectionof hybrid service paths that are more efficient than any avail-able serial service path. In some cases, it may be desirable toconstruct service paths that do not completely converge, forexample, to deliver the audio component of a media stream toa separate network interface or terminal device to the rest ofthe stream.

For the purpose of discussion, it is assumed that SPs partic-ipate in some common directory, the administration of which

can be centralized or distributed. Service selection and discov-ery is not addressed in detail here but is discussed in the con-text of related work.

Design and ImplementationIn current systems, IP addresses are the most common type ofidentifier used for end-to-end communication. However, IPaddresses are strongly bound to topological location and thusnot suitable for the purpose of delegation that is required torealize the scenarios described in the previous section. As aresult, we base our design on the HIP, which uses identifiersthat are decoupled from network topology. This section firstprovides some background on HIP and IPSec, a generaldescription of the identity delegation approach compared witha naïve private key duplication-based approach, and thendelves into the specifics of the prototype implementation.

Host Identity Protocol and IPSecSchemes such as mobile IPv6 (MIPv6) [9] and the HIP [10]provide a static identifier, referred to as a home-address (HoA)in the former and a host identity tag (HIT) in the latter, whichis separate from its IP or IPv6 address that can be routed. Theapproach presented here is based on HIP, although the generalapproach is applicable to any similar scheme.

HIP is an end-to-end communication protocol that intro-duces a thin layer of resolution between the network and

n Figure 2. Service-path-based mobility (example): a) many candidate paths, one selected (bold line)using laptop as SP for display; b) mobile router appears and offers a better path, previous pathreplaced by new path; c) mobile router leaves coverage range, display is switched off, path readjustslaptop now used as a terminal device; d) laptop battery depleted, triggering selection of different end-to-end path toward PDA.

(a)

Server #1 Server #2 Server #3

Service proxies

PN ‘A’

PDA Display Laptop

Personal network

PN ‘B’

(b)

PN ‘A’

Mobilerouter Joins

PN ‘B’

(c)

PN ‘A’

Personal network

PN ‘B’

(d)

PN ‘A’

PN ‘B’

Departs

Departs

Dep

arts

SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 43

Page 43: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200844

transport layers, decoupling sockets from network addresses.Instead of binding to IPv6 addresses, applications bind to 128-bit HITs, a flat (non-hierarchical) crypto-graphic identifiergenerated by hashing a public key. Due to the decouplingbetween the network and transport layers, HIP enables appli-cations on a mobile host to continue communication obliviousto changes in local network addresses. New HIP communica-tion sessions are preceded by a challenge-response-basedauthentication process.

As HIP deals only with control signaling, standard IPSec isused to carry the actual data traffic. The implementation ofHIP referred to in this article uses the recently proposed

bound end-to-end tunnel (BEET) mode of IPSec operationthat eliminates the requirement to retain the source and desti-nation HIT as an encapsulated header in each transmittedpacket [10]. The set up of a HIP connection between twohosts results in a pair of unidirectional BEET mode IPSecsecurity associations (SAs) at each host. The security parame-ter index (SPI) for each SA is contained in the I2 and R2 baseexchange packets and used by the hosts to determine thesource and destination HIT. The mapping between IPSec SPIand source/destination HIT is performed by the BEET modeassociation, which simply replaces the network layer addresseswith HITs after decryption.

n Figure 3. Insertion of a single host.

Proxy

signaling

(a)

Data flow

TDB

TDA

(b)

Decision to utilizeservice proxy SPA

TDB

TDA

SPA

(c)

TDB

TDA

SPA

Data flow

(d)

HIP

signaling

HIPsignaling

TDB

TDA

(e)

IPSec setup

TDB

TDA

SPASPA

(f)

TDB

TDA

SPA

n Figure 4. Insertion of arbitrary number of intermediaries.

Data flow

Data flow

(a)

TDB

TDA

SPA2

SPA1

(b)

Proxysignaling

TDB

TDA

SPA2

SPA1

(c)

HIPsignaling

TDB

TDA

SPA2

SPA1

(d)

IPSec setup

IPSec setup

TDB

TDA

SPA2

SPA1

(e)

TDB

TDA

SPA2

SPA1

(f)

Arbitraryservice proxyconfigurationnow possible

TDB

TDA

SPA2

SPAy

SPAx

SPA1

SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 44

Page 44: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 45

HIP mobility handling comprises an authenticated locationupdate procedure in which the mobile host delivers a signedlocation update packet to the correspondent host with details ofthe new network layer address. Our contribution is an extensionto standard HIP that provides a means to delegate HITs betweenphysical hosts on-the-fly in response to a mobility event.

Depending on local security policy, either the mobile hostor the correspondent host may ask to re-key the connection inresponse to mobility. Re-keying also can be requested byeither host after a certain time period has elapsed. Re-keyinginvolves the deletion of existing IPSec SAs and the establish-ment of new ones with a newly generated session key. If re-keying is not required, existing SAs are deleted andre-established with the previous session key. This reconfigura-tion of IPSec SAs is transparent to the transport layer.

To mitigate the effect of the implementation-specific secu-rity policy on experimental results, a base exchange was sub-stituted for an update procedure in the work presented here.A base exchange is, in fact, in most cases roughly equivalentto an update with the main difference being modified HIPheader fields.

Transferring Cryptographic Identities: Duplicationversus DelegationThe use of cryptographic identifiers, as in HIP, decouples theidentifier used by applications and transport layer socketsfrom the locator used for routing. A natural consequence ofdecoupling is that it is possible to transfer the identity betweendifferent physical devices, which is what we want to achieve tobe able to insert intermediate middleboxes into an ongoingcommunication session. However, for a host to be able to ver-ify that it is authorized to use a certain identifier it must pre-sent messages signed with the private key corresponding tothe identifier. This can be solved in two ways, either by dupli-cating the requisite private key on any host that requires it, orby forwarding location update signaling packets to be signedon demand. The second approach is advocated in this articlebecause private keys should, in principle, remain private. Toavoid the introduction of another acronym, the abbreviationHIT is used interchangeably with the term cryptographic iden-tifier in the remainder of this article.

In the approach proposed in this article, hosts that wish touse a delegated HIT are required to forward location updatesignaling packets to the owner of the corresponding private

key for signing. The owner of the private key canthen, at its discretion, sign the messages and returnthem to the host that requested them, which can inturn forward them on to the corresponding hostthereby verifying the claim to use the HIT. Themain advantage of this approach is that it avoidsthe dissemination of private keys. This approachalso allows temporary delegation of a HIT becausethe destination host can use the HIT only for theduration that the corresponding host does notrequest a re-keying procedure to be performed. Anadditional advantage is that because the location-update-signaling messages forwarded to the key-holder host for signatures also contain the HIT andIP addresses of the corresponding host, the key-owner host can keep track of the correspondinghosts with which communication sessions are beingconducted using its identity. Should the key-ownerwish to revoke the use of its HIT from a certaindestination host, it need only perform a re-keydirectly with the corresponding host. The drawbackof this approach is that if the key-owner host disap-

pears, any further requests to sign location-update-signalingmessages cannot be processed. This means that a destinationhost may be forced to terminate a communication session ifthe corresponding host initiates a re-key.

One potential philosophical ramification of the delegationapproach (on HIP specifically) is that so called host identitiesno longer explicitly belong to a specific host but are capableof being moved around between physical hosts, contrary tothe original intention of the designers of HIP. The proposedapproach limits the architectural impact of this by ensuringthat identities are delegated only temporarily and can neverbe used without the explicit consent of the actual entity thatthe host identity serves to identify.

Our proposal changes the notion of end-to-end security ofHIP because even though communication is still encrypted(IPSec), all nodes explicitly included in the service path canread the payload. This is a desired functionality because weenvisage that nodes included on the service path may betasked with some application-layer processing such as contentadaptation.

Implementation DetailsWe implemented the identity delegation approach describedin this article by extending publicly available code from theInfrastructure for HIP project (InfraHIP) [11] as a base. Our

n Figure 5. Proxy insertion by TDA.

TDB

SPAconnect (IP?)

p[I1(IPSPA-IPTDB

, HITTDA-HITTDB

)] I1(IPSPA-IPTDB

, HITTDA-HITTDB

)

I2

R2

R1

IPSec secured

Setup IPSec secured signaling

ack(IPSPA)

p[R1]

p[R2]

p[R12]

IPSec security association details

IPSec secured

IPSec securedTDA

n Figure 6. TCP sequence number vs. time plot: two service proxies.

Time (s)42

TCP

sequ

ence

6

(a)

(b)

8 10 12 14 16

(c)

SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 45

Page 45: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200846

extensions were evaluated on a Debian Linux system runningkernel version 2.6.16. The remainder of this section providesan analysis of the signaling procedures specific to the imple-mentation. Our description refers to a simple scenario: amobile terminal device (TDA) communicating with a staticcorrespondent terminal device (TDB). The analysis com-mences with a description of how TDA may delegate its iden-tity to an intermediary SP.

Figure 5 depicts the signaling involved in the delegationprocess. It is assumed that there is a pre-existing trust rela-tionship between terminal devices belonging to a single PN.The delegation process starts when the TDA

1 queries the SPfor the IP address (IPSP) that it wants to use for the delegat-ed identity. At the same instance, TDA and the SP establishtransport mode IPSec. This channel carries encapsulated HIPsignaling traffic, as well as the IPSec security policy and asso-ciation information used to establish a BEET mode IPSecused for applications data. The HIP signaling traffic betweenTDA and SP is sent as encapsulated payloads indicated in Fig.5 by “p […].” The SP relays any HIP signaling traffic either toTDA or TDB without modification. The whole process of iden-tity delegation and subsequent session redirection is transpar-ent to applications running on TDB. The modular nature ofour design means that the scheme can be implemented as anextension to an existing HIP-enabled network stack.

As mentioned previously, intermediary SP insertion alsocan be performed by TDB to construct chains of two or morecomposed SPs between TDA and TDB. The signaling involvedin the insertion of the second SP is equivalent to the SP inser-tion by TDA shown in Fig. 5.

Experimental EvaluationEvaluation of the identity delegation scheme was performedfor the second usage scenarios described previously. Theresults of this scenario are also applicable to the other scenar-ios because they utilize the same identity delegation mecha-nism. The intention of the experiments was twofold: first, toprovide a general evaluation of HIP performance in a real sys-tem, and second, to show that the delegation approach doesnot result in any measurable performance drop compared tounmodified HIP. Initially, it was assumed that most of thehand-off latency overhead would be due to heavy CPU loadcaused by the cryptographic operations required to sign HIPsignaling messages and establish IPSec sessions. As such, itwas expected that the performance of both approaches wouldbe equivalent, provided that the machines used to sign HIPmessages and set up IPSec sessions were equal in terms ofprocessing power. These assumptions were confirmed by theevaluation results presented below.

The experiment was performed to evaluate the scenario ofinserting intermediary SPs, which, for example, can be a con-tent adaptation SP between two devices engaged in a TCPcommunication session. The purpose of evaluating this sce-nario was to demonstrate that the TCP connection betweenthe two devices remains unbroken and that the scheme doesnot cause any specific harm to the normal performance ofhigher-layer protocols. In reality, altering the end-to-end pathin midsession may introduce some degradation in TCP perfor-mance if the new path is of lower quality than the old path;however, this issue is outside the scope of our proposal.

In these experiments, an initial communication session wasestablished from TDA (600 MHz PIII) toward TDB (500 MHzCeleron). The evaluated scenario was the insertion of two 3-GHz Pentium 4 service proxies, SPA

1 and SPA2, in serial between

the initial TCP session end points, TDA and TDB. Figure 6shows the resulting TCP sequence number vs. time plot.

The two large gaps (a) and (c) in the plot represent the

respective times at which SP1A and SP2

A were inserted inbetween TDA and TDB. From the plot it can be observed thatthe effects of the insertion of SP2

A are similar to those of theinsertion of SP1A in terms of latency and impact on TCP per-formance. However, the plot also demonstrates that insertionof multiple consecutive SPs does not result in any furtherdrop in performance provided the SPs are powerful enough tohandle the required IPSec sessions without CPU saturation.Some smaller gaps such as that indicated by (b) can beattributed to the CPU being utilized by the cryptographicoperations required to set up a secure signaling channel priorto hand off.

It is important to note that if the capability to delegate ortransfer identity were not available, then the session must bebroken and restarted to insert and remove each intermediaryproxy, causing the TCP sequence number to reset to thebeginning each time.

Related WorkThere are a number of previous and ongoing related worksaddressing inter-device mobility. On the other hand, there arefewer proposals that address the insertion of intermediary SPsas a mobility-handling technique.

The only proposal that has a similar functionality is StreamControl Transmission Protocol (SCTP). Like any other trans-port protocol, a node can be made to act as a proxy. In SCTP,when an end point (A) initiates a connection, the other end(B) can, with or without the knowledge of the initiator, openan association to another entity (C) and act as a proxy inbetween. Then B can either remove itself or make an associa-tion with C to receive the data from C. All this must be donebefore heartbeat signals are exchanged [12]. The HIP basespecification provides no mechanism for inter-device mobility.However, [1] and [8] allude to the possibility of identity dele-gation using signed certificates. The approach proposed hereprovides a higher degree of transparency and control and ismore responsive than delegation certificates.

Koponen, Gurtov, and Nikander provide a high-level dis-cussion of the potential for HIP identity delegation with cer-tificates [1]. References [2, 3] are related solutions that enableongoing communication sessions to be moved betweendevices. In [2], Su creates a virtualized network interface thatcan be transferred between different devices and with it theassociated communication sessions. It should be noted thatnone of these schemes conflict with HIP or with the schemepresented here; in fact, there is even potential for useful inter-operation. A major difference of the delegation approach isthat it focused only on managing connectivity and can beimplemented in such a way that it is at least transparent toone end of an end-to-end connection, if not both. There arealso a number of related activities in the IETF associated withlocator/ID split [13 and 14]. Of this work, the network-basedschemes such as Locator/Identifier Separation Protocol(LISP) do not consider the use of middleboxes. The others,especially mobility Internet key exchange (MOBIKE) andSHIM6 focus on device mobility and do not support the useof middleboxes as described in this article.

ConclusionAuxiliary devices that can serve as dynamically configuredmiddleboxes introduce potential for a new approach to mobil-ity handling that makes use of multiple available networkinterfaces and terminal devices. Mobility handling in this casemeans adapting to the changing status of an individual termi-nal by delivering application data flows to the best available

SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 46

Page 46: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 47

terminal device(s) and utilizing the available service proxies(middleboxes) in the best possible way. This cannot beachieved using currently available technology.

This article addresses the problem by creating and exploit-ing PNs to provide enhanced mobility handling to mobileusers. This article is focused on the specific problem of decou-pling application data flows from specific devices by makinguse of multiple available network interfaces and terminal andservice proxy devices.

We propose mechanisms to switch ongoing communicationsessions between terminal devices and to transparently insertor remove intermediary service proxies, with the mobilitymanagement schemes at layers lower than the transport layer.

The proposed identity delegation approach is based on theHIP and allows the identity creator to retain full control overthe use of their identity. The approach enables the movementof communication sessions between terminal devices, as wellas the transparent insertion and removal of middleboxes, ser-vice proxies, or other intermediaries able to perform routingor adaptation.

Future WorkFuture work in support for movement of communication ses-sions between terminal devices may include the coupling ofidentity delegation with “checkpointing” and transfer of trans-port, session, and application layer state to allow full applica-tion sessions to be moved between devices. Another problemworthy of investigation for security reasons is how to enableindependent verification of whether or not two terminaldevices belong to the same PN.

References[1] T. Koponen, A. Gurtov, and P. Nikander, “Application Mobility Using the

Host Identity Protocol,” Proc. ICT ’05, Madeira, Portugal, May 2005.[2] G. Su, MOVE: Mobility with Persistent Network Connections, Ph.D. diss.,

Columbia Univ., Oct. 2004.

[3] R. Baratto et al., “MobiDesk: Mobile Virtual Desktop Computing,” Proc.MobiCom, Philadelphia, PA, Sept. 2004.

[4] I. G. Niemegeers and S. M. Heemstra De Groot, “From Personal Area Net-works to Personal Networks: User Oriented Approach,” Wireless PersonalCommun., vol. 22, no. 2, 2002, pp 175–86.

[5] S. Herborn, A Personal-Network Centric Approach to Mobility Aware Net-working, Ph.D. diss., Univ. New South Wales, Mar. 2007.

[6] S. Ardon et al., “MARCH: A Distributed Content Adaptation Architecture,”Int’l. J. Commun. Sys., vol. 16, 2003, pp. 97–115.

[7] B. Knutsson and H. Lu, “Architecture and Performance of Server DirectedTranscoding,” ACM Trans. Internet Technology, vol. 3, 2003, pp. 392–424.

[8] R. Moskowitz et al., “Host Identity Protocol,” Internet RFC 5201; http://www.ietf.org/rfc/rfc5201.txt

[9] D. Johnson, C. Perkins, and J. Arkko, “Mobility Support in IPv6,” IETF RFC3775; http://www.ietf.org/rfc/rfc3775.txt

[10] P. Nikander and J. Melen, “A Bound End-to-End Tunnel (BEET) Mode forESP,” Internet draft; http://tools.ietf.org/id/draft-nikander-esp-beet-mode-08.txt

[11] InfraHIP project; http://infrahip.hiit.fi/[12] T. Aura, P. Nikander, and G. Camarillo, “Effects of Mobility and Multihom-

ing on Transport-Protocol Security,” IEEE Symp. Security and Privacy, Berke-ley, CA, May 2004.

[13] D. Meyer, “The Locator/ID Split, Its Implications for IP Architecture, and aFew Current Approaches,” Future of Routing Wksp., APRICOT ’07;http://www.1-4-5.net/dmm/talks/apricot2007/locid

[14] D. Lee, X. Fu, and D. Hogrefe, “A Review of Mobility Support Paradigmsfor the Internet,” IEEE Commun. Surveys & Tutorials, vol. 8, no. 1, 2006.

BiographiesARUNA SENEVIRATNE ([email protected]) received his Ph.D. in elec-trical engineering from the University of Bath, United Kingdom, in 1982. He isdirector of the NICTA Australian Technology Park Laboratory. He has held aca-demic appointments at the University of Bradford, United Kingdom, Curtin Uni-versity, and the University of New South Wales. He has also held visitingappointments at the University of Pierre and Marie Curie, Paris, and INRIA,Nice. In addition, he has been a consultant to numerous organizations includingTelstra, Vodafone, Inmarsat, and Ericsson.

STEPHEN HERBORN ([email protected]) completed his Ph.D. at theUniversity of New South Wales under the supervision of Professor Aruna Senevi-ratne. He works for Accenture consulting. Between 2003 and 2008, he was amember of the Networking and Pervasive Computing (NPC) program at NICTAin Sydney, first as a student and then as a full-time researcher. While at NICTA,his research activities centered around personal area networking, mobile net-working, and context-aware computing.

SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 47

Page 47: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200848 0890-8044/08/$25.00 © 2008 IEEE

Choongul Park, Kitae Jeong, and Sungil Kim, KT Technology LabYoungseok Lee, Chungnam National University

AbstractCurrently, many customer devices are being connected to home networks. For thisreason, it is expected that device management capabilities will be a powerfulinstrument for the service provider to cope with high maintenance costs, securityconcerns, and management issues related to home networks. Through DM, the ser-vice provider could provide valuable services such as auto-provisioning, remoteconfiguration, firmware and software updates, diagnostics, monitoring, scheduling,and fraud management. However, network address translators that are widelydeployed in the home network environment prohibit DM operations from reachinguser devices behind the NAT. In this article, we focus on NAT issues in the man-agement of home network devices. Specifically, we discuss efforts relating to stan-dardization and present our proposal to deploy DM services for VoIP and IPTVdevices behind NATs. By slightly changing the behavior of Simple Network Man-agement Protocol managers and agents and by defining additional managementobjects (MOs) to gather NAT binding information, we could solve the NAT traver-sal problem under a symmetric NAT. Moreover, we propose an enhanced methodto search for the UDP hole binding time of the NAT box. For evaluation, weapplied our method to 22 randomly selected VoIP devices out of 194 NATed hostsin the real broadband network and achieved a success ratio of 99 percent forexchanging SNMP request messages and a 26 percent enhancement in determin-ing the UDP hole binding time.

NAT Issues in the Remote Management ofHome Network Devices

n the broadband network, the service provider must com-municate with customer devices located at the end of thelast mile for administrative purposes. As user devicesbecome more diverse and complex, the software that

controls them will become more complex as well. Thus, forthe development of effective device management (DM) soft-ware, it is important to deal with the various customer prob-lems that could be centered on the device, such as firmwareupdates, software misbehaviors, or configuration errors.

The costs relating to deployment, customer care, operation,and management in a large-scale network could be significant-ly reduced through the DM services described in Fig. 1. Theimportance of the remote device management function will beemphasized further as the number of broadband subscriberscarrying network-attached devices increases dramatically. InKorea, the number of subscribers to high-speed broadbandInternet services is over 14 million.1 Therefore, it is assumedthat many network address translators (NATs) are deployed

in home networks. According to recent statistics2 from KoreaTelecom (KT), which is the largest Internet service provider(ISP) in Korea, approximately 20 percent of customer devicesare located behind a NAT middlebox.

A NAT [1] allows several computers to share a single publicIP address. Private IP addresses are assigned to hosts behind aNAT, which means that communication between hosts and thepublic Internet nodes pass through the NAT, which maintainsport and address translation information. A managing devicethat is behind a NAT is one of the most urgent problems forservice providers who are attempting to provide DM servicesto their customers. As shown in Fig. 1, a device managementsystem (DMS) communicates with a device management client(DMC) to receive information from the remote devices andcontrol them. In a NATed environment, a DMS, like otherInternet applications, cannot avoid the NAT traversal problem.Namely, if a customer is using a NAT, the device behind theNAT cannot be controlled by the DMS.

The NAT traversal problem has been studied a great deal inorder to support hosts using voice over IP (VoIP) and peer-to-peer (P2P) applications behind a NAT [2]. However, to thebest of our knowledge, this issue has not been considered withthe aim of controlling NATed hosts through DM althoughsome efforts for standardization have begun recently.

In this article, we aim to identify the issues and challengesrelating to NATs when using DM to manage home networkdevices that are behind a NAT. In addition, we present a Sim-

II

1 This was announced by the Korea Information Promotion Committee inthe domestic information trend (vol. 7, no. 8) in March 2008.

2 The percentage of NAT penetration was produced by our device manage-ment system named U-CEMS, even though official figures were lower thanours by 5 percent in July 2007, which was referenced by the Digital Timesnews (www.dt.co.kr) published on July 30th, 2007

PARK LAYOUT 9/5/08 1:02 PM Page 48

Page 48: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 49

ple Network Management Protocol (SNMP)-based approachto control hosts under NATs, which employs a User Data-gram Protocol (UDP) hole-punching technique with the cor-rect timer estimation method. The remainder of this article isorganized as follows. We provide an overview of DM proto-cols and standards and then discusses the open issues of theremote management of NATed devices. We describe our pro-posal using SNMP as device management, give the results ofour experiment, and also make comparisons with other DMmethods. Our conclusions and suggestions for future researchare presented later.

Managing Devices Behind a NATOverview of Device Management ProtocolsThere are many device management protocols; the protocolswe discuss here are presented in Table 1. These are stan-dards-based protocols that are widely accepted around theworld by many service and solution providers for device man-agement.

Open Mobile Alliance (OMA) [3] for DM uses extensiblemarkup language (XML) for data exchange, more specificallythe subset defined by Open Mobile Alliance device synchro-nization (OMA DS). Open Mobile Alliance-device manage-ment (OMA DM) is designed to support Wireless SessionProtocol (WSP), Wireless Application Protocol (WAP),Hypertext Transfer Protocol (HTTP), or OBject Exchange(OBEX) or similar transports as a transport layer protocol.The protocol specifies the exchange of packages during a ses-sion, with each package consisting of several messages andeach message in turn being composed of one or more com-mands. The server initiates the commands, and the client isexpected to execute the commands and return the result witha reply message.

Technical Report 069 (TR-069) [4] is also a device manage-ment protocol that is defined by a digital subscriber line(DSL) forum technical specification. This application layerprotocol provides the remote management function for end-user devices. Based on a bidirectional Simple Object Access

Protocol (SOAP)3/HTTP protocol, it enables communicationbetween a device and a DMS. Typical applications of TR-069are safe auto-configuration and the control of other customerpremises equipment (CPE) management functions within theintegrated framework.

The SNMP [5, 6] is popular in network managementbecause it enables easy monitoring of the status of network-attached devices through SNMP. A set of standards for net-work management and application-layer protocols, a databaseschema, and a set of data objects are defined in SNMP, withmanagement data specified in the form of variables on themanaged systems, which describe the system configurationinformation. These variables can then be queried and some-times set by SNMP manager applications.

Open Issues of Remote Management of NATedDevicesAs explained earlier, several protocols have been standardizedto support device management. However, with the advent ofmany NATs in the home network environment, a NATbecomes an important part to consider. Therefore, we presentopen issues in the remote management of NATed devices.

A NAT translates between internal private IP addressesand external public ones. NATs, particularly network addressport translation (NAPT), one of the most common NAT sys-tems, deal with communication sessions, which are identifieduniquely by the combination of source IP address, source portnumber, destination IP address, and destination port number.

When a NATed device in a private network sends packetsto the external host, the NAT intercepts the packet andreplaces the source private IP address and the port numberwith a public IP address and a port number. Subsequently,when the NAT receives an incoming packet from the samepublic IP address and port number, it replaces their destina-tion address and port number with the corresponding entrystored in the translation table, forwarding the packet to theprivate network.

The first issue in the remote management of a NATed deviceis to find an efficient way to facilitate the successful exchange ofremote management request/response messages through theNAT box. A DMS cannot provide management authorities withmanagement functions for a device behind a NAT because themanagement operations are blocked by the NAT.

n Figure 1. Management of customer devices.

OMA-DM

SNMP, TR-069

Northbound interface

Notebook PC

IPTVSTB

VoIPphoneNAT

middleboxWiFi phone

Home network

Mobile network

Mobilephone

Note-book

WiFiphone

PDA

DM servicesAuto-provisioningRemote diagnostics and controlService quality managementFirmware and software managementStatus and fault monitoringOperation supportingInventory managementStatistics and report management

DM operationsGet MOsSet MOsEvent notificationAdd/replace/copy MOsExec MO

Managementauthorities DM server (DMS) DM client (DMC)

Southbound interface

3 SOAP stood for Simple Object Access Protocol, but this acronym wasdropped in Version 1.2 of the standard because it was considered to bemisleading

PARK LAYOUT 9/5/08 1:02 PM Page 49

Page 49: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200850

On the other hand, a NAT maintains a table that mapsthe private addresses and the port numbers to the publicport numbers and IP addresses. Thus, it is important tonote that this “binding” information could be initiatedonly by outgoing traffic from the internal host. In addi-tion, most NATs maintain an idle timer for several outgo-ing sessions and close the hole if no traffic is observed forthe given time period. If we knew the default timer valueof a NAT, we could minimize session management over-head. However, there is no way to know the default timervalue without any information about the NAT itself, suchas vendor or model. In other words, the issue that wefocus on here is determining the NAT timer value for anunknown NAT. Therefore, a second important issue is toestimate the correct timer value for each NAT box at aminimal cost. Without knowledge of the appropriate timervalues for each NAT, the DMS repeatedly must sendunnecessary probe packets to each NAT to find it in alarge-scale network.

These two issues are not specific to DM but related to allapplications under an unknown NAT. To provide DM servicesagainst a NAT, we look into efforts for standardizing NATeddevice management.

Efforts for the Standardization of NATed DeviceManagementWhen it comes to the issue of how to manage NATeddevices, there exist similar discussions such as RFC 2962 [7]and the CALLHOME Birds of a Feather (BoF) draft [8].First, RFC 2962 describes an SNMP application level gate-

way (ALG) for payload address translation, but this ALGhas serious limitations, including its scalability and speed ofdeployment of new applications. Moreover, it requires anupgrade to existing NATs. A CALLHOME BoF was held atthe 64th IETF meeting and suggested a connection modelthat reversed the client-server role when establishing a con-nection. However, its activity ended without a clear result.For these reasons, in this section we focus on the efforts ofthe defacto DM standardization body to manage NATeddevices. We discuss and compare these with our approach indetail in a later section.

Technical Report-111 — TR-111 [9] extends the mechanismdefined in TR-069 for the remote management of devicesand is incorporated in TR-069 ANNEX G. TR-111 enablesa management system to access and manage devices con-nected to a local area network (LAN) through a NAT.Two mechanisms were suggested in TR-111. TR-111 Part 1is defined for the situation in which both the NAT and thedevice are TR-069 managed by the same DMS. TR-111Part 2 provides a mechanism to realize a remote connec-tion request to a device behind a NAT, in the event thatthe NAT does not support TR-069. It allows a DMS to ini-tiate a TR-069 session with a device that is operatingbehind a NAT. The simple traversal of UDP throughNATs (STUN) protocol mechanism defined in RFC 3489[10] is included as Part 2 of TR-111, in which a device usesSTUN to determine whether or not the device is behind aNAT. Then, if the device is behind a NAT with a privateallocated address, the device uses the procedures definedin STUN to discover the binding timeout. The device

n Table 1. Overview of three device management standards.

OMA-DM TR-069 SNMP

Organization OMA1 DSL Forum2 IETF3

Target devicesMobile devices (mobilephones, PDAs, palm topcomputers, etc)

Fixed devices (Residential gateways,VoIP phones, IPTV STB, etc)

Network elements (computers,routers, switches, terminal servers,VoIP phones)

Typical uses

Provisioning Configuration ofdevice Software managementFOTA (Firmware Over the Air)Fault management

Auto configuration Dynamic serviceactivation Firmware management Statusand performance control

Fault management Configurationmanagement Account managementPerformance management Securitymanagement

Data model OMA-DS4 XML ASN.15

Transport protocol TCP TCP UDP

Operations Add, Get, Replace, Exec,Copy, Event

GetRPCMethods, Get/Set Parameter(Values, Names, Attributes),(Add/Delete)Object, Download, Inform,etc.

Get, GetNext, GetBulk, Trap, Set

Currentspecification

OMA Device ManagementV1.2 Approved Enabler(April.2006)

CPE WAN Management Protocol v1.1(Dec. 2007)

RFC3411(Dec. 2002), RFC3418(Dec.2002)

1 Open Mobile Alliance; http://www.openmobilealliance.org2 Digital Subscriber Line Forum; http://www.dslforum.org3 Internet Engineering Task Force (IETF); http://www.ietf.org4 Open Mobile Alliance Data Synchronization; the former name of OMA-DS is Synchronization Markup Language (SyncML).5 Abstract Syntax Notation One (ASN.1): a standard and flexible notation that describes data structures for representing, encoding,transmitting, and decoding data.

PARK LAYOUT 9/5/08 1:02 PM Page 50

Page 50: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 51

sends periodic STUN-binding requests at a sufficient fre-quency to maintain the NAT binding, on which it listensfor UDP connection requests. The STUN-based mecha-nism requires a large amount of bandwidth but covers awide range of usages including VoIP service deployedbehind an unmanaged or unfriendly NAT, or home net-works with multiple NATs. Two alternative mechanismsbased on Dynamic Host Configuration Protocol (DHCP)or universal plug and play (UPnP) have been proposedand discussed recently, as follows:• DHCP-based TR-111: DHCP is a well-known protocol

used by networked devices to obtain the parameters nec-essary for Internet connectivity. A device informs itsconnection request URL to the NAT via DHCP option60. The NAT in turn creates a proxy URL to use thisURL for the communication back to the device. Then,the device can communicate the proxy URL as its con-nection request URL to the DMS. The NAT forwardspackets on the proxy URL to the device connectionrequest URL.

• UPnP-based TR-111: UPnP is a set of protocols that allowdevices in the home network to be connected seamlessly. Adevice uses UPnP to discover the NAT, learn its public IPaddresses, and open a forwarding port. After a port isopened, the device can register for notification of changesto the wide area network (WAN) IP address and communi-cate the connection request URL with the public IP addressof the NAT and the forwarding port to the ambient controlspace (ACS).However, these two alternatives also are limited, in that the

NAT must support the DHCP option mechanism or theUPnP protocol.

OMA CDM — There is standardization work being performedin the area that involves the discussion of converged DM(CDM) issues, such as the configuration and management ofdevices that support one or more bearer technologies for ser-vices. This is expected to standardize urgent managementissues for devices within a consumer’s network, including theassessment of device management when a device is locatedbehind a NAT. However, the CDM standardization issue,which will be a part of the OMA-DM v2.0 work item docu-ment (WID), is still in its infancy.

Lessons from Standardization Efforts — Note that two approach-es exist to provide DM services over NATs. One is to makeNATs manageable like ALGs or friendly to device manage-ment protocols like proxies. Thus, the NAT box could relaythe operation of a DMS to the device. The other is to adoptcommon NAT solutions like STUN and to make DMCs inde-pendent of NAT traversal mechanisms.

The first solution is not easily applied in the real environ-ment because most currently deployed NATs do not supportSNMP ALG, the DHCP option, or UPnP. Thus, the deploy-ment cost becomes expensive. The second solution usingSTUN, a well-known NAT traversal mechanism, also has scal-ability and cost issues in that it requires additional dedicatedservers and clients.

Then, we propose a SNMP-based device managementscheme exploiting the UDP hole punching technique [11] thateasily could be implemented and deployed in the current net-work. The comparison of those issues is discussed morespecifically later.

Our Proposal: Using SNMP as DeviceManagement

More than 700,000 IPTV set-top boxes and VoIP deviceshad been distributed to high-speed broadband customersin the KT network as of the end of 2007.4 Consequently,those customer devices must be managed by the integrateddevice management system. To manage those devices, weare adopting the popular SNMP version 2c as a DM proto-col. There are several reasons for this choice. First, SNMPis a well-known protocol both for service providers anddevice vendors, which means that we can benefit from fit-t ing the t ime-to-market by rapidly implementing ourrequirements . In addit ion, vendors want to adopt alightweight DM client to avoid a cost problem for devicesin terms of required resources. However, we are trying tochange this situation by considering SNMPv3, TR-069, orOMA-DM because of the issue of security, which could bebig challenge in the future.

Accordingly, in this article we propose an SNMP-based DMmethod for NATed hosts over unmanageable NATs. TheUDP mechanism using SNMP traps that we propose in thisarticle is easier to implement and deploy than a TCP mecha-nism to manage NATed devices because a TCP mechanismcould result in substantial system overhead by holding a largenumber of sessions initiated by more than hundreds of thou-sands of devices.

Our Challenges in Avoiding the NAT TraversalProblemAs stated previously, to solve the NAT traversal problem, wedesigned a connection request mechanism for the NATeddevice, which enables a DMS to exchange SNMP messagesthrough an unknown NAT.

Based on the well-known NAT traversal mechanism ofUDP hole punching, we slightly modified the behaviors of theSNMP manager and the agent and defined additional MOs sothat we could effectively solve the NAT traversal issues in amanner that avoids the problems of cost and scalability.

Moreover, we developed an enhanced method to determinethe UDP hole binding time for the NAT box and applied it toVoIP devices. Through experiments, we obtained a signifi-cantly reliable result that 99 percent of exchanges of SNMPrequest messages were successful, and the searching time forthe default UDP hole binding timer value was reduced by 26percent.

Most NATs hold an idle timer for a UDP session and closethe hole if no traffic is observed for the given time period.Hence, we can reach a device behind a NAT by using theUDP hole punching scheme, in which the SNMP agent sendskeep-alive trap messages to the DMS periodically. Thisenables the DMS to recognize the private/public address andport binding information. Because the SNMP agent at thedevice usually uses UDP port 161 for the SNMP request mes-sage, the binding entry for UDP port 161 must exist in thebinding table at the NAT. Generally, any port could be allo-cated to the source port for sending the SNMP trap message.If an SNMP agent sends the trap message using the fixedUDP port of 161, we can ensure that the binding entry will bemaintained in the binding table at the NAT.

On the other hand, there is another problem in managingdevices with a private IP address, known as a symmetric NATproblem, whereby only a public IP:port can reach a privateIP:port if the traffic is initiated from the private network. Tosolve this problem, a DMS uses UDP port 162 to send the

4 This was announced in the pressroom on the MegaTV portal;http://mymegatv.com/pressroom/pressroomList03.asp.

PARK LAYOUT 9/5/08 1:02 PM Page 51

Page 51: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200852

SNMP request message because the SNMP agent sends theSNMP trap message to the destination port of 162.

By using these concepts, we propose an SNMP-basedremote management method for a device behind a NAT asfollows. First, we define the behavior of the SNMP agentembedded in the device. An SNMP agent triggers a UDP holeby periodically sending SNMP trap messages and keep-alivemessages to the DMS. We chose 180 seconds for the intervalof keep-alive messages because it was the most frequentlyfound value of NATs in our experiments. We also fixed thesource port of the SNMP trap message sent from the SNMPagent to UDP port 161. Second, we added a function of gath-ering the agent IP address and its source port number to theSNMP manager in the DMS. If the agent IP address is differ-ent from the source IP address, the SNMP manager decidesthat the device that has sent the SNMP message is locatedbehind a NAT. To avoid the symmetric NAT problem, theSNMP manager must fix the source port to 162 when sendingthe SNMP request message.

Proposal of SNMP-Based DM over NATFigure 2 shows the message flow associated with the proce-dures of our proposed method to manage a NATed device byusing the UDP hole punching scheme.

In Fig. 2 the address/port pairs use the notation (Address:port).There are four steps in our mechanism, as follows:

• Precondition: The DMC uses port 161 as its listening portfor receiving SNMP requests from the DMS and as itssource port for sending SNMP trap messages. The DMSuses port 162 as its listening port for receiving SNMP trapmessages sent from the DMC and as its source port forsending SNMP request messages to the DMC.

• Step 1: Creating a UDP hole: When the IP address (A) isassigned to the device by the NAT, the DMC (A:161) sendsthe SNMP trap message to the DMS (C:162), whichincludes the device address as a trap object and its value(ClientAddress=A), which provides the private IP address ofthe device. The NAT translates the IP:port pair (A:161) ofthe SNMP trap packet to (B:p), which are the IP addressand the port number allocated by the NAT, randomly orsequentially. In other words, the NAT creates a UDP holeand a binding entry (A:161, B:p).

• Step 2: Binding discovery: The DMS determines that thedevice is located behind a NAT when it knows that theaddress (A) of the device extracted from the SNMP objectClientAddress differs from the source address of the SNMPtrap packet. If a device is behind the NAT, the DMSextracts the binding information (A:161, B:p) from theSNMP trap message, whereby the IP:port (A:161) of thedevice is extracted from the SNMP message, and theIP:port pair (B:p) of the NAT is extracted from the receivedSNMP trap packet.

n Figure 2. The sequential message flow of SNMP device management on the NAT environment.

SNMP trap at hole-timer interval

Step 1Creating

UDP hole

Source

SNMP trap

A161

DestinationC

162ClientAddress=A

AddressPort

Trap object

(Address port)

Private IP address: A

Public IP address: B, C

Port: 161,162,p(dynamic)

(C:162)

DMS

Source

SNMP trap

Bp

DestinationC

162ClientAddress=A

AddressPort

Trap object

Source

SNMP request

C162

Passthrough

UDP hole

Extract Awith (B:p)

DestinationBp

AddressPort

Source

SNMP response

Bp

DestinationC

162Address

Port

Source

SNMP request

C162

DestinationA

161Address

Port

Source

SNMP response

A161

DestinationC

162Address

Port

(B:p)(A:161)

NAT middleboxDMC

Step 2Binding

discovery

Step 3Keep punching

UDP hole

Step 4Sending SNMP

request message

PARK LAYOUT 9/5/08 1:02 PM Page 52

Page 52: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 53

• Step 3: Keep punching the UDP hole: To maintain the UDPhole bound with entry (A:161, B:p) of the NAT, the DMCkeeps sending SNMP trap messages to the DMS (C:162) athole-timer intervals.

• Step 4: Sending SNMP request messages: When DMS (C:162)wants to manage a NATed device, it sends the SNMPrequest message to manage the device through the hole(B:p). Then, the message can pass through the hole andreach the NATed device. The NAT translates (B:p) to(A:161) according to the binding table. The DMC receivesthis SNMP message from UDP port 161, and it sends theSNMP response message to the DMS (C:162). The processto deliver this SNMP response message with the result tothe DMS is the same as that of the SNMP request message.

Heuristic to Estimate the UDP Hole Punching TimerValuesIn general, UDP mapping timer values are not standardizedso they could be different for each NAT vendor. For theremote management of devices behind a NAT from a publicnetwork, the DMS should make the user device send a UDPpacket periodically before the UDP hole is closed. In otherwords, the device should punch the UDP hole periodically atthe time interval configured by the DMS. Note that searchingthe UDP mapping time could cause a large amount of over-load to the DMS in a large-scale network because the DMSmust send many probe packets with the estimated timeout val-ues for each NAT box.

As such, we propose a heuristic method to maintain the listof the top 10 UDP mapping times statistically obtainedthrough experiments. Then, we applied the binary search algo-rithm to the list of the top 10 known timer values.That is, a DMS uses the binary search algorithm tofind the UDP mapping time in the list and then tosearch it between two items. Table 2 summarizesfour kinds of applicable search methods. The experi-mental results are explained in the next section.

Experimental ResultsTo evaluate the proposed method, we implementedan SNMP manager and defined the client behavior.With the implementation, we tested our method in a

nation-wide high-speed broadband network, as shown in Fig.3. Of 1177 manageable VoIP devices, 194 hosts are shown tobe NATed in our network; thus, on average 17 percent of endhosts are NATed. Based on these hosts, we randomly selected22 devices and tested our proposed method. The reason whywe chose only a small number of devices is that we had tocarefully test the minimum number of devices so as not toaffect customer service if we sent command messages repeat-edly.

The SNMP manager was implemented based on Universityof California, Davis (UCD) SNMP version 4.2.6,5 and cansend and receive SNMP messages simultaneously using oneport, UDP 162. Hence, we embedded the SNMP agent intothe device with our proposed method.

Table 3 shows the results of different methods of searchingfor the UDP hole time. Our heuristic, based on the binarysearch method, showed the best performance in the experi-ments. Compared with the popular binary search method, ourheuristic could reduce search times by 26 percent, as well asreduce the average number of probes by 0.6.

Table 4 shows the command success ratio of our proposedmethod. We could achieve a 99 percent success rate of SNMP-command penetration into the NATed devices. This resultprovides compelling evidence that it is possible to manage adevice using private IP addresses, without any additionalservers or equipment. It also shows that the top N binarysearch heuristic is useful for the efficient management ofNATed hosts.

There might be an argument with our NAT traversal successrate of 99 percent when compared with the well-known resultof 80 percent in [12]. First, we think that the small number ofexperimented devices (22 devices) could be contributing to the

n Table 2. Methods for searching for the UDP hole timer values of a NAT (EV: Expect Value, IV: Initial Value, TE: Tolerable Error,PEV: Previous Expect Value).

Method Concept Algorithm

Linearsearch • Search UDP mapping time with a linearly increasing value of EV 1) EV = IV + TE

2) Wait for EV and send SNMP command

Slow startsearch

• Similar method as TCP congestion control• Search UDP mapping time by increasing EV exponentially, but afterfailure perform a linear search increase

1) EV = IV + PEV*22) Wait until EV and send SNMP command3) If success go to 1) If fail do linear search

Binarysearch • Use binary search method

1) EV = (MinVal + MaxVal)/22) Wait until EV and send SNMP command3) If success, MinVal = EV and go 1)

If fail, MaxVal = EV and go 1)

TopN binarysearch

• Limited types of NAT vendors are deployed in the real environment• Maintain TopN list of UDP hole time• Based on the TopN list, first perform binary search between entries

1) Binary search in the Top N list2) Binary search between TopN(i) andTopN(i+1) until the difference is in TE

n Table 3. Average time of searching UDP hole timer values for differentNATs.

Method Device Test Average numberof probes

Averagetime (s)

Linear search 22 196 25.6 4608

Slow start search 22 275 19.2 2984

Binary search 22 488 2.9 470

TopN binary search 22 541 2.3 348

5 The current release version of UCD SNMP is NET-SNMP5.4.1; http://net-snmp.sourceforge.net

PARK LAYOUT 9/5/08 1:02 PM Page 53

Page 53: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200854

high success ratio of passing through NATs, com-pared to the experiment in [12] with 40 differentkinds of NATs. We could not perform experi-ments with a large number of subscribers becausetesting that should not affect ongoing service. Italso is possible that the failure rate of 1 percent isdue to multiple NATs or abnormal NATs.

Comparison of DM Approaches underNATIn this section, we present a brief comparativediscussion of various approaches to realizingremote device management functions underNATs as in Table 5.

Using STUN servers requires an additionalserver, as well as client modules to support themechanism, thus demanding more overhead inthe aspects of implementation complexity, scala-bility, fault tolerance, security, deployment cost,command response time, system load, and com-patibility. DHCP and UPnP are available only inenvironments where NATs are compatible withTR-111. Moreover, as mentioned before, CDMstill has no standardization result. However, ourproposal based on SNMP shows manageableadvantages when compared with other methods.

Conclusion and Future WorkAs the number of NATs deployed in the broadbandnetwork grows, more and more IP devices will behidden behind a NAT. Therefore, it is necessary fora DMS to find a connection request mechanism forNATed devices so that it can exchange messagesthrough an unknown NAT. When we apply theknown NAT traversal solutions to the real environ-ment, we can meet new challenges, such as expen-sive maintenance costs and symmetric NATproblems. The problem of expensive maintenancecosts is related to the additional servers that mustbe deployed, or the NATs that must be upgraded tosupport a remote connection request mechanism.The problem of symmetric NATs is that the NATtraversal mechanism must work under all differentkinds of NATs, including symmetric NATs.

In this article, we presented a simple overviewof early standardization efforts and have pro-posed an effective remote SNMP connectionrequest mechanism for NATed devices using theUDP hole punching method. By slightly modify-ing the behaviors of the SNMP manager and theagent, and by defining additional managementobjects to gather NAT binding information, wesolved the cost problem and symmetric NATissue. In addition, we proposed an enhancedmethod to efficiently determine the binding timeof the UDP holes of the NAT box. For the exper-imental evaluation, we applied our method to 22VoIP devices behind NATs in the real environ-ment and achieved a success ratio of 99 percentin exchanging SNMP request messages and a 26percent enhancement in determining the UDPhole binding time. Even though the proposedDM protocol is to be changed to SNMP v3 in thefuture, we believe that this would necessitate onlya slight change in our scheme.

n Table 4. SNMP command success ratio for NATed devices.

Hole punchingmethod

Searchinghole timer Device Test Success Success

ratio (%)

No hole punching — 22 NA NA 0

Hole punching Fixed timer of180 s 22 2160 1636 75

Hole punching TopN binarysearch 19 1221 1207 99

n Table 5. Comparisons of device management methods (�

: good, º

: average,l

: bad).

TR-111CDM Our

proposalSTUN DHCP UPnP

Implementation complexity l

� �

NA �

Scalability º � �

NA º

Fault tolerance l

� �

NA �

Security � º º

NA º

Deployment cost l

º º

NA �

Command response time l

º º

NA �

System load l

� �

NA º

Compatibility º

l l

NA �

–Implementation complexity: Our proposal employs a NAT traversal mecha-nism by slightly changing SNMP-based DM software. However, STUN requiresan additional dedicated server and a client in addition to the DM software.–Scalability: The periodic hole punching method that increases the number ofin-flight packets in proportion to the number of devices may cause a scalabili-ty problem like STUN. However, our proposal uses fewer and smaller-sizedpackets than STUN.–Fault tolerance: Our proposal has fewer points of failures causing the overallavailability of the system, whereas TR-111 with STUN needs an additionalSTUN server and a client for NAT traversal. –Security: Our proposal may be affected by source address spoofing attackslike DHCP and UPnP. –Deployment cost: Except slightly changing the SNMP trap mechanism, ourproposal does not require an additional deployment cost. On the other side,for TR-111, STUN needs dedicated clients and servers, and DHCP or UPnPfunctions should be deployed on all NATs. –Command response time: Our proposal could reduce the command responsetime by estimating the UDP hole timer values correctly. However, STUN mightexperience a long command response time because of additional intermediatenodes like STUN servers and clients. –System load: Our proposal will result in less server load to maintain the NATtraversal mechanism using simple UDP packets. However, STUN uses a littlecomplex mechanism exchanging many UDP and TCP packets for NAT traversal. –Compatibility: Our proposal is compatible with all NAT environments includ-ing symmetric NAT. However, STUN is the independent standard in additionto device management and needs additional considerations for the compati-bility with the legacy system.

PARK LAYOUT 9/5/08 1:02 PM Page 54

Page 54: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 55

There are some issues in our system. One is the issue ofscalability, in that our system must put up with thousands ofkeep-alive SNMP trap messages per minute from thousandsof devices. As a result, we are now approaching a time-to-live(TTL)-based scheme to avoid heavy traffic that is not requiredto reach a DMS, in which we send periodic trap messageswith TTL = n (n being the least count of TTL to punch thehole), which will be discussed in the future. Another issue isthe security issue arising from SNMP v2. To address thisissue, in the near future we are considering changing the DMprotocol to one of the standards-based secure DM protocols,such as SNMP v3, OMA-DM, or TR-069. In future work, weare going to estimate and analyze real environment results inour large scale VoIP and IPTV network that will be widelydeployed this year, reaching over a million devices.

AcknowledgmentThis work was partly supported by the IT R&D program ofMKE/IITA (2008-F-016-01, Collect, Analyze, and Share forFuture Internet) and partly by the ITRC (Information Tech-nology Research Center) support program of MKE/IITA(IITA-2008-C1090-0801-0016). The corresponding author isYoungseok Lee.

References[1] M. Holdrege, “IP Network Address Translator (NAT) Terminology and Con-

siderations,” IETF RFC 2663, Aug. 1999.[2] S. Guha and P. Francis, “Characterization and Measurement of TCP Traver-

sal through NATs and Firewalls,” Proc. Internet Measurement Conf., Berke-ley, CA, Oct. 2005.

[3] OMA, “OMA Device Management V1.2 Approved Enabler,” Feb. 2007.[4] DSL Forum, “CPE WAN Management Protocol v1.1,” Dec. 2007.[5] J. Case et al., “Introduction and Applicability Statements for Internet Stan-

dard Management Framework,” IETF RFC 3410, Oct. 2000.[6] W. Stallings, SNMP, SNMPv2, SNMPv3, and RMON 1 and 2, 3rd ed.,

Addison Wesley, 1998.[7] D. Raz, J. Schoenwaelder, and B. Sugla, “An SNMP Application Level Gate-

way for Payload Address Translation,” IETF RFC 2962, Oct. 2000.

[8] E. Lear, “Simple Firewall Traversal Mechanisms and Their Pitfalls,” IETF draft,Oct. 2005.

[9] DSL Forum, “Technical Report 111, Applying TR-069 to Remote Managementof Home Networking Devices,” Dec. 2005.

[10] J. Rosenberg et al., “STUN — Simple Traversal of User Datagram Protocolthrough Network Address Translators,” IETF RFC 3489, Mar. 2003.

[11] B. Ford and P. Srisuresh, “Peer-to-Peer Communication across NetworkAddress Translators,” USENIX Annual Technical Conf., 2005.

[12] C. Jennings, “NAT Classification Test Results,” IEEE draft, July 2007.

BiographiesCHOONGUL PARK ([email protected]) received B.S. and M.S. degrees in computer engi-neering in 2001 from Pusan National University and in 2008 from ChungnamNational University, Korea, respectively. Currently, he is a Ph.D. student atChungnam National University. He joined KT Technology Laboratory in 2002and started his research work on the Next Generation OSS project. Since 2005he has been a member of the KT Device Management project and a seniorresearcher in the Department of Next Generation Network Research. Hisresearch interests include device management and traffic engineering in the next-generation Internet.

KITAE JEONG ([email protected]) received B.S. and M.S. degrees in 1983 and 1986in electronic engineering from Kyungpook National University, and a Ph.D. fromTohoku University of Japan in 1996. He joined KT Laboratory in 1986, and isthe leader of the Department of Next Generation Network Research. Hisresearch interests are in the fields of device management, next-generation net-work, and fiber to the home.

SUNGIL KIM ([email protected]) received B.S. and M.S. degrees in 1992 and 1994 incomputer engineering from Choongbuk National University. He joined KT Tech-nology Laboratory in 1994, and is the leader of the KT Device Management pro-ject and delegate to the Broadband Convergence Network StandardizationGroup. His research interests are in the fields of device management and next-generation networks.

YOUNGSEOK LEE [SM] ([email protected]) received B.S., M.S., and Ph.D. degrees in1995, 1997, and 2002, respectively, all in computer engineering, from SeoulNational University, Korea. He was a visiting scholar at Networks Lab at the Uni-versity of California, Davis from October 2002 to July 2003. In July 2003 hejoined the Department of Computer Engineering, Chungnam National University.His research interests include Internet traffic measurement and analysis, traffic engi-neering in next-generation Internet, wireless mesh networks, and wireless LAN.

n Figure 3. Test environment of estimating UDP hole time at the commercial broadband network in Korea.

Backbone

DMS located inKT’s backbone network (KORNET)

Home network

194 hosts exist under NATsout of 1,177 VoIP phones

Access network

Ethernet, xDSL, FTTH

KORNET

NAT

Ethernet

FES

DMS VoIP phone

DMC

Modem

xDSL

DSLAMNAT VoIP phone

DMC

Modem

FTTH

OLTNAT VoIP phone

DMC

PARK LAYOUT 9/5/08 1:02 PM Page 55

Page 55: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200856 0890-8044/08/$25.00 © 2008 IEEE

oday, the vast majority of the communications on theInternet are between nodes located in non-transit (i.e.,stub) networks. Stub networks are primarily composedof medium and large enterprise customers, universities,

public administrations, content service providers (CSPs), andsmall Internet service providers (ISPs). These networksexploit a widespread practice called multihoming, which con-sists of using multiple external links to connect to differenttransit providers. By increasing their connectivity to the Inter-net, stub networks potentially can obtain several benefits,especially in terms of resilience, cost, and traffic performance[1]. These are described as potential benefits because multi-homing per se cannot improve any resilience, cost, and trafficperformance. Accordingly, multihomed stub networks requireadditional mechanisms to achieve these improvements. In par-ticular, when an automatic mechanism actively optimizes thecost and end-to-end performance of the traffic routed amongdifferent links connecting a multihomed stub network to theInternet, it is referred to as intelligent route control (IRC).

During the last few years, IRC has attracted significantinterest in both the research and the commercial fields. Sever-al vendors are developing and offering IRC solutions [2–4]that increasingly are being adopted by multihomed stub net-

works. Most available IRC solutions follow the same princi-ple, that is, they dynamically shift part of the egress traffic ofa multihomed subscriber from one of its ISPs to another,using measurement-driven path switching techniques. IRC sys-tems operate in relatively short timescales — even reachingswitching frequencies on the order of a few seconds — allow-ing IRC users to balance cost and performance criteriaaccording to the priority and requirements of their applica-tions.

Despite these strengths, IRC practices have one majorweakness, that is, they try to achieve a set of local objectivesindividually without considering the effects of their decisionson the performance of the network. Recently, it was discov-ered that in a competitive environment, IRC systems actuallycan cause significant performance degradation rather thanimprovement. In [5], the authors show that persistent oscilla-tions can occur when independent controllers become syn-chronized due to a considerable overlap in their measurementtime windows. To avoid synchronization issues, the authorspropose randomized IRC strategies and empirically show thatthe oscillations disappear after introducing a random compo-nent in the route control decision.

It is important to note that although randomization offers astraightforward mechanism to mitigate the oscillations, it can-not guarantee global stability. This issue raises concerns giventhe proliferation of IRC products because as the number ofinterfering IRC systems increases, randomization becomes

TT

Marcelo Yannuzzi, Xavi Masip-Bruin, Eva Marin-Tordera, Jordi Domingo-Pascual, Technical University of Catalonia

Alexandre Fonte, Polytechnic Institute of Castelo BrancoEdmundo Monteiro, University of Coimbra

AbstractMultihomed subscribers are increasingly adopting intelligent route control solutionsto optimize the cost and end-to-end performance of the traffic routed among thedifferent links connecting their networks to the Internet. Until recently, IRC practiceswere not considered adverse, but new studies show that in a competitive environ-ment, they can lead to persistent traffic oscillations, causing significant performancedegradation rather than improvements. To cope with this, randomized IRC tech-niques were proposed. However, the proliferation of IRC products raises concerns,given that randomization becomes less effective as the number of interfering IRCsystems increases. In this article, we present a more scalable route control strategythat can better support the foreseeable spread of IRC solutions. We show that byblending randomization with adaptive filtering techniques, it is possible to drasti-cally reduce the interference between competing route controllers, and this can beachieved without penalizing the end-to-end traffic performance. In addition to thepotential improvements in terms of scalability and performance, the route controlstrategy outlined here has various practical advantages. For instance, it does notrequire any kind of protocol or coordination between the competing IRC middle-boxes, and it can be adopted readily today because the only requirement is a soft-ware upgrade of the available route controllers.

Improving the Performance of Route ControlMiddleboxes in a Competitive Environment

This work was partially funded by the European Commission throughCONTENT under contract FP6-0384239.

YANNUZZI LAYOUT 9/9/08 12:35 PM Page 56

Page 56: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 57

less effective, and hence, the more likely it is that the oscilla-tions reappear. In light of this, it is necessary to explore morescalable route control strategies that can safely support theforeseeable spread of IRC solutions.

In principle, two research approaches can be taken. On theone hand, the research community could formally study thestability properties of IRC practices and provide guidelines onhow to design IRC systems with guaranteed stability. Unfortu-nately, several challenging stages must be completed properlybefore a formal study of stability can be conducted. Forinstance, accurate measurements are required to understandcomprehensively the actions of the closed-source IRC systemsdeployed today (e.g., [2–4]) and thereby, model the stochasticdistribution of path switches in a competitive IRC environ-ment. Only after characterizing the distribution of pathswitches, is it possible to formally study the stability aspects ofcompetitive IRC.

In the absence of such characterization, the practical alter-native is to find ways to drastically reduce the potential inter-ference between competing route controllers withoutpenalizing the end-to-end traffic performance. This is precise-ly the challenge addressed in this work. This article makes thefollowing contributions:• We show that although randomization offers a straightfor-

ward way to mitigate the oscillations, it leads to a largenumber of unnecessary path switches.

• We report some of our recent results on the development ofstrategies blending randomization with a lightweight andmore “sociable” route-control algorithm. The term sociableroute control (SRC) refers here to a route control strategythat explicitly considers the potential implications of itsdecisions in the performance of the network and can adap-tively restrain its intrinsic selfishness depending on the net-work conditions.

• We show that a simple enhancement to randomized IRC

systems, such as endowing them with an SRCalgorithm supported by adaptive filteringtechniques, is enough to drastically reducethe number of path switches, and most impor-tantly, this can be accomplished withoutpenalizing the end-to-end traffic perfor-mance. Extensive simulations show that withSRC, it is possible to reduce the overall num-ber of path switches between approximately40 to 80 percent on average (depending onthe load on the network) and still obtain bet-ter end-to-end traffic performance than withrandomized IRC techniques in a competitiveenvironment.The rest of the article is structured as follows.

First, we present the basics of IRC. Then, weoverview the most relevant related work. Next,we analyze some general aspects of differentIRC strategies and describe the SRC approachtogether with some of our main results. We con-clude with directions for future research in thearea of IRC.

The Basics of IRCA typical IRC scenario with two different con-figurations is shown in Fig. 1. The IRC box atthe top of Fig. 1 is connected by a span port offa router or switch so although the egress trafficis controlled by the box, it is never forwardedthrough it. The IRC box in the multihomed net-work at the bottom of Fig. 1 is placed along the

data path so traffic always is forwarded through it. Typically,the former configuration offers a more scalable solution thanthe latter, in the sense that it is able to control and optimize alarger number of traffic flows.

Conceptually, an IRC system is composed of the followingthree modules (Fig. 1):• Monitoring and measurement module (MMM)• Route control module (RCM)• Reporting and viewer module (RVM)

The existing IRC systems can control a moderately largenumber of flows1 toward a set of target destination networks.These target destinations can be configured manually or dis-covered by means of passive measurements performed by theMMM. By using passive measurements, the MMM can rankthe destinations according to the amount of traffic sourcedfrom the local network and subsequently optimize the perfor-mance for the traffic toward the D destinations at the top ofthe rank. The MMM also uses passive measurements to moni-tor the target flows in real time and analyze packet losses,latency, and retransmissions, among others, as indicators ofconformance or degradation of the expected traffic perfor-mance. To assist the RCM in the dynamic selection of thebest egress link to reach each target destination, the MMMprobes all the candidate paths using both Internet ControlMessage Protocol (ICMP) and Transmission Control Protocol(TCP) probes.

The set of active and passive measurements collected bythe MMM enables IRC systems to concurrently assess thequality of the active and the alternative paths toward the tar-get destinations. The role of the RCM is to dynamically

n Figure 1. The IRC model. IRC systems are composed of three modules: themonitoring and measurement module (MMM), the route control module(RCM), and a reporting and viewer module (RVM).

Enforceroutingdecision

ISP11

IRC box

ISP1n

ISP21 ISP22 ISP2m

IRC box

MMM

RCM

RVM

IP

IP

MMM

RCM

RVM

ISP12

1 Typically this is on the order of several hundreds and even thousands,using a configuration like the one shown at the top of Fig. 1 with severalborder routers.

YANNUZZI LAYOUT 9/9/08 12:35 PM Page 57

Page 57: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200858

choose the best egress link for each target flow, depending onthe outcome of these measurements. More specifically, theRCM is capable of taking rapid routing decisions for the tar-get flows, often avoiding the effects of issues such as distantlink/node failures2 or performance degradation due to conges-tion.3

The third module of an IRC system, namely the RVM, typ-ically supports a broad set of reporting options and providesonline information about the average latency, jitter, band-width utilization, and packet loss experienced through the dif-ferent providers, summaries of traffic usage, associated costsfor each provider, and so on.

Overall, IRC offers an incremental approach, complement-ing some of the key deficiencies of the Interior Gateway Pro-tocol/Border Gateway Protocol (IGP/BGP)-based routecontrol model. It is worth emphasizing that the set of candi-date routes to be probed by IRC boxes usually is determinedby IGP/BGP; so conversely to overlay networks [8], IRC boxesnever circumvent IGP/BGP routing protocols. The effective-ness of multihoming in combination with IRC is confirmednot only by studies like [8], but also by the increased trend inthe deployment of these solutions.

In this article we deal with the algorithmic aspects ofIRC systems so hereafter we focus our attention on theRCM in Fig. 1 — the functionality of the MMM and RVMmodules essentially is orthogonal to the proposals made inthis work.

Related WorkIn [9], the authors simultaneously optimize the cost and per-formance for multihomed stub networks, by introducing aseries of new IRC algorithms. The contributions of that workare fundamentally theoretical. For instance, the authors showthat an intelligent route controller can improve its own per-formance without adversely affecting other controllers in acompetitive environment, but the conclusions are drawn attraffic equilibria (traffic equilibrium is defined by the authorsas a state in which no traffic can improve its latency by unilat-erally changing its link assignment). However, after examiningand modeling the key features of conventional IRC systems, itbecomes clear that they do not seek this type of traffic equi-libria. Indeed, more recent studies, such as [5], show that inpractice, the performance penalties can be large, especiallywhen the network utilization increases.

In light of this, and considering the current deploymenttrend of IRC solutions, it becomes necessary to explore alter-native IRC strategies. These new route control strategiesshould always improve the performance and reliability of thetarget flows, or at least, they should drastically reduce thepotential implications associated with frequent traffic reloca-tions, such as persistent oscillations causing packet losses andincreased packet delays [5].

Although most commercially available IRC solutions do notreveal in depth the technical details of their internal operationand route control decisions, the behavior of one particularcontroller is described in detail in [10]. That work also pro-vides measurements that evaluate the effectiveness of differ-ent design decisions and load balancing algorithms. Akella etal. also provided rather detailed descriptions and experimental

evaluations of multihoming in combination with IRC tools, asin [1, 8, 11]. These research publications, along with the docu-mentation provided by vendors, allowed us to capture andmodel the key features of conventional IRC techniques. Asimilar approach was followed by the authors in [5]. For sim-plicity, and as in [5, 8, 10], we consider traffic performance asthe only criteria to be optimized for the target flows.4

The General IRC Network ModelThe general IRC network model is composed of a multi-homed stub network S, a route controller CC, the transitdomains, and a set of target destinations {d} with cardinality|d| = D to be optimized by CC. The source domain S has a setof egress links {e}, with |e| = E. For the sake of simplicity,we keep the notation in the granularity of destinations (d),but the model easily can be extended to consider various flowsper target d.

To dynamically decide the best egress link for each targetdestination d, the MMM in CC probes all the candidate pathsthrough the egress links e of S. Then, the collected measure-ments are processed and abstracted into a performance func-tion Pe

(d,t) at time t, associated with the quality perceived foreach of the available paths toward the target destinations d.Let N(d) denote the number of available paths to reach d.Because N(d) usually represents the number of candidatepaths in the forwarding information base (FIB) of the BGPborder routers of S, N(d) ≤ E ∀ d.

We assume that the better the end-to-end traffic perfor-mance perceived by CC for a target destination d throughegress link e, the lower the value of the performance functionPe

(d,t).In this framework, IRC strategies can be taxonomized into

two categories, namely, reactive route control (RRC) andproactive route control (PRC). RRC practices switch a targetflow from one egress link to another only when a maximumtolerable threshold (MTT) is met. The MTTs are application-specific and typically represent the maximum acceptable pack-et loss, the maximum tolerated packet delay, and so on, for agiven application. Beyond any of these bounds, the perfor-mance perceived by the users of the application becomesunacceptable.

PRC strategies, on the other hand, switch traffic before anyof the MTTs are met and in turn, can be taxonomized intotwo categories: those that can be called fully proactive (FP),and those that follow a controlled proactivity (CP) approach.FP IRC practices always switch to the best path. Therefore,the dynamic optimization problem addressed by a FP routecontroller is to:

Find the min{Pe(d,t)} ∀ d, t and enforce the redirection of the

corresponding traffic to the egress link found.

The alternative offered by CP is to keep the proactivity, butswitch traffic as soon as the performance becomes degradedto some extent, typically represented by a relocation threshold(Rth). The dynamic optimization problem addressed by CP-based strategies can be formulated as follows.

Let ebest denote the egress link utilized to reach d at time t,and let e′ be such that Pe′

(d,t) = min{Pe(d,t)} for destination d at

time t.5 A CP-based route controller would switch traffic to dfrom ebest to e′ whenever Pebest

(d,t) – Pe′(d,t) ≥ Rth, with Rth > 0.2 The timescale required by IRC systems to detect and react to a distant

link/node failure is very small compared to that of the general IGP/BGProuting system [2–4, 6].

3 This cannot be automatically detected and avoided with BGP [7].

4 Cost reductions are typically accomplished by aggregating traffic towardnon-target destinations over the cheapest ISPs.

YANNUZZI LAYOUT 9/9/08 12:35 PM Page 58

Page 58: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 59

After extensive evaluations and analysis, we confirmed thatPRC performs much better than RRC. The reason for this isthat proactive approaches can anticipate network congestionsituations, which in the reactive case, typically demands sever-al traffic relocations when congestion already was reached. Inaddition, we found that in a competitive environment, CP-based route control strategies can outperform the FP ones.Therefore, our SRC algorithm (outlined in the following sec-tion) is supported by a CP-based route control strategy.

Sociable Route ControlIn the SRC strategy that we conceive, each controller remainsindependent so the SRC boxes do not require any kind ofcoordination with one another — just as conventional IRCsystems operate today. Moreover, our SRC strategy does notintroduce changes in the way measurements are conductedand reported by conventional IRC systems, so both the MMMand the RVM in Fig. 1 remain unmodified. Our SRC strategyintroduces changes only on the algorithmic aspects of theRCM.

High-Level Description of the SRC StrategyFor simplicity in the exposition, we focus on the optimizationof a single application, namely, voice over IP (VoIP), and wedescribe the overall SRC process for the round-trip time(RTT) performance metric. For a comprehensive and formalanalysis, the reader is referred to [12].

Our goal is that a controller CC becomes capable of adap-tively adjusting its proactivity, depending on the RTT condi-tions for each target destination d. To be precise, a sociablecontroller analyzes the evolution of the RTT, that is,{RTTe

(d,t)}, and depending on its dynamics, the controller canrestrain its traffic reassignments adaptively (i.e., its proactivi-ty). To this end, the RCM processes the RTT samples gath-ered from the MMM using two filters in cascade (Fig. 2). Thefirst filter corresponds to the median RTT, Me

(d,t), which isconstantly computed through a sliding window. This approachis used widely in practice because the median represents agood estimator of the delay that the users’ applications cur-

rently are experiencing in the network. These medians areprecisely the input to the second filter, where the socialnature of the route control algorithm covers two differentfacets:• CP• SRC

Controlled ProactivityOn the one hand, the proactivity of box CC is controlled toavoid minor changes in the medians triggering traffic reloca-tions at S. This prevents interfering too often with other routecontrollers. For this reason, our sociable controllers filter themedians.

The second filter in Fig. 2 works like an analog-to-digital(A/D) converter, with quantization step ∆, and its output isone of the levels of the converter Qe

(d,t). The right-hand sideof Fig. 2 illustrates how the instantaneous samples of RTT arefiltered to obtain the median Me

(d,t), and then, the latter is fil-tered to obtain Qe

(d,t).As described earlier, IRC systems compare the quality of

the active and alternative paths by means of a performancefunction Pe

(d,t), which as shown in Fig. 2, is fed by Qe(d,t). The

controller CC would switch traffic toward d only when the vari-ations of Qe

(d,t) cause that Pebest(d,t) = Pe

(d,t) ≥ Rth. A more detaileddescription of the route selection process is shown in Algo-rithm 1. For simplicity, only the stationary operation of thealgorithm is summarized. The randomized nature of Algo-rithm 1 is discussed later. The timer in Step 8 is also intro-duced later.

For the RCM described here, we simply used the outcomeof the digital conversion as the performance function Pe

(d,t),that is, the number of quantization steps in the quantificationlevel Qe

(d,t). Similarly, Rth represents the difference in thenumber of quantization steps that Pe

(d,t) must reach to triggera path switch.

Overall, the advantage of this filtering technique is that itproduces the desired effect (i.e., controlled proactivity)because it prevents minor changes in the medians from trig-gering unnecessary traffic relocations at S.

Socialized Route ControlThe second facet of the social behavior of the algorithmrelates to the dynamics of the median RTTs; more precisely,

n Figure 2. Filtering process and interaction between the monitoring and measurement module (MMM) and the route control module(RCM) of a sociable route controller. The Randomized SRC Algorithm within the RCM is outlined in Algorithm 1.

RCMSecondfilter

Medians

FirstfilterMMM

RandomizedSRC

algorithm

Compute

Pe(d,t)

Qe(d,t)

Me(d,t)RTTe

(d,t)

Sampling instants (s)

Me(d,t)RTTe

(d,t)

0

MTT

(ms)

Qe(d,t)

∆(d,t)

5 We notice that with CP, ebest might be different from e′.

YANNUZZI LAYOUT 9/9/08 12:35 PM Page 59

Page 59: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200860

with how rapid the variations are in the median values thatare typically computed by IRC systems using a sliding window.The motivation for this is that when the median values start toshow rather quick variations, the algorithm must react so as toavoid a large number of traffic reassignments in a shorttimescale. Such RTT dynamics typically occur when severalroute controllers compete for the same resources, leading tosituations where their traffic reassignments interfere with eachother. To cope with this problem, we turn the second filter inFig. 2 into an adaptive filter. This filter is endowed with anadaptive quantization step ∆(d,t) for each target destination dthat is automatically adjusted by the algorithm according tothe evolution of the median RTTs. If the RTT conditions aresmooth, the quantization step is small, and more proactivity isallowed by the controller CC. However, if the RTT conditionscould lead to instability, the quantization step ∆(d,t) automati-cally increases, so the number of changes in the values ofQe

(d,t) is diminished or even stopped until the network condi-tions become smooth again. This has the effect of desynchro-nizing only the competing route controllers. Therefore, thefiltering technique outlined here allows a controller CC to“sociably” decide whether to switch traffic to an alternativeegress link or not, in the sense that the degree of proactivityof CC is constantly adjusted by the adaptive nature of the sec-ond filter.

For the sake of simplicity, we focused here on the optimiza-tion of a single performance metric (the RTT), but the con-cept of SRC is general and can be extended to consider othermetrics, such as available bandwidth, packet losses, and jitter.When multiple metrics are used, two straightforwardapproaches can be followed.

On the one hand, a combination of two or more metricscan be used in the same performance function Pe

(d,t). Forinstance, [12] introduces a more general performance functionbased on a non-linear combination of the quantification levelQe

(d,t) and the available bandwidth (AB) in the egress links ofthe source network. This, in turn, can be extended to considerthe AB along the entire path to a target destination d, usingavailable bandwidth estimation techniques like the onedescribed in [5]. With this approach, the weights of the differ-ent metrics combined in Pe

(d,t) can be tuned on an applicationbasis, for example, to prioritize the role of the AB over theRTTs (or vice versa) depending on the application type.

On the other hand, multiple performance functions Pe(d,t)

can be used (e.g., one for each metric), and the selection ofthe best path for each target destination can be performed bysequentially comparing the performance functions Pe

(d,t) andtie-breaking similarly to the BGP tie-breaking rules [7]. Withthis approach, the order in which the performance functionsare compared can be tuned on an application basis. For exam-ple, a controller might select the path with the maximum AB,and if there is more than one path with the same AB, choosethe one with the lowest RTT.

In either case, adaptive filtering techniques are required toprevent rapid variations in the performance metrics consid-ered.

RandomizationRandomization is present in Algorithm 1 in two differentways: implicitly and explicitly. On the one hand, the routecontrol decisions in Algorithm 1 are inherently stochastic for anumber of reasons, for example, due to its adaptive featuresalong time, the fact that different controllers might have con-figured different thresholds Rth, and others. On the otherhand, we explicitly use a hysteresis switching timer TH that weintroduced in a previous work [13] and that guarantees a ran-dom hysteresis period after each traffic relocation. More pre-cisely, traffic toward a given destination d cannot be relocateduntil the random and decreasing timer TH = 0. A similarapproach was used in [5] for one of the randomized algo-rithms presented there.

Performance EvaluationThe performance of our SRC strategy is compared againstthat obtained with:• Randomized IRC• Default IGP/BGP routing

Evaluation Methodology and Simulation Set UpThe simulation tests were performed using the event-drivensimulator J-Sim [14]. All the functionalities of the route con-trollers were developed on top of the IGP/BGP implementa-tions available in this platform.

Network Topology — The network topology was built usingthe Boston University Representative Internet TopologygEnerator (BRITE) [15]. The topology was generated usingthe Waxman model with (α, β) set to (0.15, 0.2) [16], and itwas composed of 100 domains with a ratio of domains tointer-domain links of 1:3. This simulated network aims atrepresenting a set of ISPs that can provide connectivity andreachability to customers operating stub networks. Weassume that all ISPs operate points of presence (PoPs)through which the stub networks are connected. We consid-ered 12 uniformly distributed stub networks across thedomain-level topology as the traffic sources toward the setof target destinations. These source networks are connectedto the routers located at the PoPs of three different ISPs.We considered triple-homed stub networks given that signif-icant performance improvements are not expected fromhigher degrees of multihoming [1]. For the stub networkscontaining target destinations, we considered 25 uniformlydistributed destinations across the domain-level topology.This offers an emulation of 12 × 25 = 300 IRC flows com-peting for the same network resources during the simulationrun time.

Furthermore, given that IRC solutions operate in shorttimescales, we assumed that the domain-level topologyremains invariant during the simulation run time.

n Algorithm 1. Randomized SRC algorithm.

Input: d – A target destination of network S{e} – Set of egress links of network SPe

(d,t) – Performance function to reach d through e at time t

Output: ebest – The best egress link to reach target destination d

1: Wait for changes in P(d,t)ebest

2: if P(d,t)ebest – Pe

(d,t) < Rth ∀e ≠ ebest then go to Step 13: /* Egress link selection process for d */4: Choose e′ as Pe′

(d,t) = min{Pe(d,t)}

5: Estimate the performance after switching the traffic6: if P

(d,t)ebest – Pe′

(d,t)Estimate ≥ Rth then

7: Wait until TH =0 /* Hysteresis Switching Timer */

8: Switch traffic toward d from ebest to e′9: ebest ← e′10: P

(d,t)ebest ← Pe′

(d,t)

11: end if12: /* End of egress link selection process for d */13: Go to Step 1

YANNUZZI LAYOUT 9/9/08 12:35 PM Page 60

Page 60: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 61

Simulation Scenarios — We run the same simulations sepa-rately using three different scenarios:• Default IGP/BGP routing, where BGP routers choose their

best routes based on the shortest AS-path• BGP combined with the SRC strategy at the 12 source

domains• BGP combined with randomized IRC systems at the 12

source domainsFor a more comprehensive comparison between the differ-

ent route control strategies, we performed the simulations forthree different network loads. We considered the followingload factors (L):• L = 0.450, low load corresponding to an average occupancy

of 45 percent of the egress links capacity• L = 0.675, medium load corresponding to an average occu-

pancy of 67.5 percent of the egress links capacity• L = 0.900, high load corresponding to an average occupan-

cy of 90 percent of the egress links capacity

Simulation Conditions — The simulation tests were conductedusing traffic aggregates sent from the source domains to eachtarget destination d. These traffic aggregates were composedof a variable number of multiplexed Pareto flows as a way togenerate the traffic demands, as well as to control the networkload during the tests. The flow arrivals were modeled accord-ing to a Poisson process and were independently and uniform-ly distributed during the simulation run time. This approachaims at generating sufficient traffic variability to support theassessment of the different route control strategies.

In addition, we used the following method to generate traf-fic demands for the remaining Internet traffic, usually referredto as background traffic. We started by randomly picking fournodes in the network. The first one chosen acts as the origin(O) node, and the remaining three nodes act as destinations(D) of the background traffic. We assigned one Pareto flowfor each O-D pair. This process continues until all the nodes

are assigned with three outgoing flows (including those in themultihomed stub domains and those in the ISPs). All back-ground connections were active during the simulation runtime.

Furthermore, the frequency and size of the probes sent bythe route controllers were correlated with the outbound trafficbeing controlled, just as conventional route controllers dotoday [2–4].

Finally, we assume that the route controllers have pre-established performance bounds (i.e., the MTTs) for the traf-fic under control. For instance, the recommendation G.114 ofthe International Telecommunication Union-Telecommunica-tion Standardization Sector-(ITU-T) suggests a one-way-delay(OWD) bound of 150 milliseconds to maintain a high qualityVoIP communication over the Internet. Thus, for VoIP traf-fic, the maximum RTT tolerated was chosen as twice thisOWD bound, that is, 300 ms.

Objectives of the Performance EvaluationOur evaluations have two main objectives.

Assess the Number of Path Switches — The first objective ofthe simulation study is to demonstrate that the sociable natureof our SRC strategy contributes to drastically reducing thepotential interference between competing route controllers.To this end, we compared the number of path switches thatoccurred during the simulation run time for the 300 compet-ing IRC flows for the SRC and randomized IRC scenarios.The number of path switches is obtained by adding the num-ber of route changes that are required to meet the desiredRTT bound for each target destination d.

It is worth emphasizing that in both the randomized IRCand SRC strategies, the route controllers operate indepen-dently and compete for the same network resources. Thisallows us to evaluate the overall impact on the traffic causedby the interference between several standalone route con-

n Figure 3. Number of path switches (top) and <RTTs> (bottom) for L = 0.450 (left), L = 0.675 (center), and L = 0.900 (right).

Rth1

86420

1000

Path

sw

itch

es

2000

3000

4000

108642

1008642

SRCRandomized IRCIGP/BGP routing

SRCRandomized IRC

Rth

2

60

RTTs

(m

s)

40

80

100

120

140

810

64281

642 8100

642

SRCRandomized IRCIGP/BGP routing

Rth

2

RTTs

(m

s)

810

64281

642 8100

642

SRCRandomized IRCIGP/BGP routing

Rth

2

RTTs

(m

s)

60

40

20

80

100

120

140

0

60

40

20

80

100

120

140

0810

64281

642 8100

642

Rth1

86420

2000

Path

sw

itch

es

4000

6000

8000

10000

12000

108642

1008642

SRCRandomized IRC

Rth1

86420

2000

Path

sw

itch

es

4000

6000

8000

10000

12000

108642

1008642

SRCRandomized IRC

YANNUZZI LAYOUT 9/9/08 12:35 PM Page 61

Page 61: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200862

trollers running at different stub domains. Thus, when analyz-ing the results for the different route control strategies, it isimportant to keep in mind that we take into account all thecompeting route controllers present in the network.

To contrast the number of path switches under fair condi-tions, we made the following decisions. First, both the ran-domized IRC and SRC controllers are endowed with thesame (explicit) randomization technique [5, 13]. This approachavoids the appearance of persistent oscillations that mightlead to a large number of path switches in the case of conven-tional IRC [5]. Second, both types of controllers follow a con-trolled proactivity approach. We have conducted thesimulations modeling the same triggering condition Rth forboth of them. The main difference is that in the SRC case,the social adaptability of the controllers can result in the trig-ger being reached more often, or less often, depending on thevariability of the RTTs on the network.

End-to-End Traffic Performance — The second objective of thesimulation study is to demonstrate that the drastic reductionin the number of path switches obtained with our SRC strate-gy can be achieved without penalizing the end-to-end trafficperformance. To this end, we compared the RTTs obtainedfor the 300 flows in the three different scenarios, namely,default IGP/BGP, SRC, and randomized IRC.

Main ResultsThe top of Fig. 3 illustrates the total number of path switchesperformed by both the randomized IRC and SRC strategies,in all the stub networks, and for the three different load fac-tors: L = 0.450 (left), L = 0.675 (center), and L = 0.900(right). The number of path switches is contrasted for differ-ent triggering conditions, that is, for different values of thethreshold Rth (shown on a logarithmic scale).

Several conclusions can be drawn from the results shown inFig. 3. In the first place, the results confirm that SRC drasti-cally reduces the number of path switches compared to a ran-domized IRC technique.6 An important result is that thereductions are significant for all the load factors assessed. Forinstance, when compared with randomized IRC, our SRCstrategy contributes to reductions of up to:• 77 percent for Rth = 1 and 71 percent for Rth = 2 when L =

0.450• 75 percent for Rth = 1 and 74 percent for Rth = 2 when L =

0.675• 34 percent for Rth = 1 and 36 percent for Rth = 2 when L =

0.900

The second observation is that the reductions in the num-ber of path switches offered by the SRC strategy becomemore and more evident as the proactivity of the controllersincreases, that is, for low values of Rth, which is precisely theregion where IRC solutions operate today. It is worth recall-ing that these results were obtained when both route controlstrategies were complemented by the same randomized deci-sions. This confirms that in a competitive environment, SRCis much more effective than pure randomization in reducingthe potential interference between route controllers.

On the other hand, our results show that when the routecontrol strategies become less proactive, that is, for higher val-ues of Rth, randomized IRC and SRC tend to behave compar-atively the same so SRC does not introduce any benefit over arandomized IRC technique.

To assess the effectiveness of SRC, it is mandatory to con-firm that the reductions obtained in the number of pathswitches are not excessive, resulting in a negative impact onthe end-to-end traffic performance. To this end, we first ana-lyze the performance of randomized IRC and our SRC “glob-ally,” that is, by averaging the RTTs obtained by “all”competing route controllers. This is shown at the bottom ofFig. 3 and in Fig. 4. The end-to-end performance obtained by“each” route controller individually, is shown in Fig. 5.

The bottom of Fig. 3 reveals that as expected, both SRCand randomized IRC perform much better than IGP/BGP forall values of L and Rth, and the improvements in the achievedperformance become more evident as the network utilizationincreases. In particular, SRC is capable of improving the⟨RTTs⟩7 by more than 40 percent for L = 0.675 and by morethan 35 percent for L = 0.900 when compared with IGP/BGP.

Moreover, the ⟨RTTs⟩ obtained by SRC and IRC are compar-atively the same and particularly for L = 0.675, SRC not onlydrastically reduces the number of path switches, but alsoimproves the end-to-end performance for almost all the trigger-ing conditions assessed. It is worth emphasizing that a low valueof Rth together with a load factor of L = 0.675 reasonably reflectthe conditions in which IRC currently operates in the Internet.

Our results also reveal an important aspect: by allowingmore path switches, some route controllers can improveslightly their end-to-end performance, but such actions haveno major effect on the overall ⟨RTTs⟩. Indeed, a certain num-ber of path switches is always required, and this number ofpath switches is what actually ensures the average perfor-mance observed in the RTTs at the bottom of the Fig. 3 (thisbecomes clear as the proactivity decreases).

By analyzing Fig. 3 as a whole, it becomes evident that the

n Figure 4. Complementary cumulative distribution function (CCDF) of the RTTs for the 300 competing IRC flows, for Rth = 1, and for L = 0.450 (left), L = 0.675 (center), and L = 0.900 (right).

RTTs (ms)

40300

0.1

P(RT

T>=

x)

0.2

0.3

0.4

0.50.6

0.7

0.8

0.9

1

50 60 70 80 90 100 110 120

SRCRandomized IRCIGP/BGP routing

RTTs (ms)

500

0.1

P(RT

T>=

x)

0.2

0.3

0.4

0.50.6

0.7

0.8

0.9

1

100 150 200 250 300 50 100 150 200 250 300 350

SRCRandomized IRCIGP/BGP routing

RTTs (ms)

00.1

P(RT

T>=

x)

0.2

0.3

0.4

0.50.6

0.7

0.8

0.9

1SRCRandomized IRCIGP/BGP routing

6 Clearly, no results are shown for the default IGP/BGP routing scenariohere because BGP does not perform path switching actively.

7 As mentioned previously, this average is computed over the RTTsobtained by all competing route controllers in the network.

YANNUZZI LAYOUT 9/9/08 12:35 PM Page 62

Page 62: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 2008 63

selection of the best triggering condition actually depends onthe load present in the network. For this particular case, thebest trade-offs are Rth = 30 for L = 0.450, Rth = 10 for L =0.675, and Rth = 7 for L = 0.900, which is a reasonable pro-gression to lower values of Rth because the route controllersrequire less proactivity when the network utilization is low.The corollary is that the triggering condition should be adap-tively adjusted as well, depending on the amount of trafficcarried through the egress links of the domain. We plan toinvestigate this in the future.

Figure 4 compares the distribution of the RTTs obtained byIGP/BGP, SRC, and randomized IRC for the 300 competingIRC flows, for the three different load factors assessed, andfor Rth = 1, which as mentioned above is in the range of oper-ation of the IRC solutions presently deployed in the Internet.To facilitate the interpretation of the results, we use the com-plementary cumulative distribution function (CCDF).

An important observation is that under high egress link uti-lization, that is, L = 0.900, there is a fraction of ⟨RTTs⟩ forwhich the bound of 300 ms is exceeded in the case ofIGP/BGP; whereas both SRC and the randomized IRC fulfillthe targeted bound.

To complete the analysis, Fig. 5 provides a more granular

picture than Fig. 4 because it shows the CCDFs of the RTTsobtained by each of the 12 competing route controllers. Thefigure shows the results for the three studied scenarios and forall the load factors assessed when Rth = 1. Our results showthat the targeted bound of 300 ms is satisfied by both SRCand randomized IRC in all cases and for all controllers.IGP/BGP, however, shows a distribution of large delays giventhat the shortest AS-paths are not necessarily the best per-forming paths. Figure 5 also shows that when consideringboxes individually, randomized IRC achieves slightly betterend-to-end performance for some of them but at the price ofa much larger number of path switches:• ≈ ≈ 435 percent larger for L = 0.450• ≈ ≈ 400 percent larger for L = 0.675• ≈ ≈ 80 percent larger for L = 0.900 when Rth = 1.

ConclusionIn this article, we examined the strengths and weaknesses ofrandomized IRC techniques in a competitive environment.We proposed a way to blend randomization with a sociableroute control (SRC) strategy, where by sociable, we mean aroute control strategy that explicitly considers the potential

n Figure 5. CCDFs for IGP/BGP routing (top), SRC (center), and randomized IRC (bottom), for L = 0.450 (left), L = 0.675 (center),and L = 0.900 (right).

RTTs (ms)

40300

0.1

P(RT

T>=

x)

0.2

0.3

0.4

0.5

0.6

0.70.8

0.91

50 60 70 80 90 100 110 120

IGP/BGP routing

RTTs (ms)

00.1

P(RT

T>=

x)

0.2

0.3

0.4

0.5

0.6

0.70.8

0.9

1

250 30020015010050

IGP/BGP routing

00.1

P(RT

T>=

x)

0.2

0.3

0.4

0.5

0.6

0.70.8

0.9

1

250 35030020015010050

IGP/BGP routing

RTTs (ms)

40300

0.1

P(RT

T>=

x)

0.2

0.3

0.4

0.5

0.6

0.70.8

0.91

50 60 70 80 90 100 110 120

SRC

RTTs (ms)

00.1

P(RT

T>=

x)

0.2

0.3

0.4

0.5

0.6

0.70.8

0.91

50 100 150 200 250 300 350

SRCSRC

RTTs (ms)

00.1

P(RT

T>=

x)

0.2

0.3

0.4

0.5

0.6

0.70.8

0.91

250 30020015010050

RTTs (ms)

RTTs (ms)

40300

0.1

P(RT

T>=

x)

0.2

0.3

0.4

0.5

0.6

0.70.8

0.9

1

50 60 70 80 90 100 110 120

Randomized IRC Randomized IRC Randomized IRC

RTTs (ms)

00.1

P(RT

T>=

x)

0.2

0.3

0.4

0.5

0.6

0.70.8

0.91

250 30020015010050

RTTs (ms)

00.1

P(RT

T>=

x)

0.2

0.3

0.4

0.5

0.6

0.70.8

0.9

1

250 35030020015010050

YANNUZZI LAYOUT 9/9/08 12:35 PM Page 63

Page 63: IEEE.network.magazine.vol.22.No.5.Sep.oct.2008.Retail.ebook KiMERA

IEEE Network • September/October 200864

implications of its decisions in the performance of the net-work and with the ability to adaptively restrain its intrinsicselfishness depending on the network conditions. We haveshown that in a competitive scenario, our SRC strategy iscapable of drastically reducing the potential interferencebetween controllers without penalizing the end-to-end trafficperformance. This makes SRC more scalable and promisingthan pure randomization, given the proliferation of IRC sys-tems in the Internet.

SRC strategies, like the one described in this article, alsohave a number of practical advantages; for example, they donot require any kind of coordination between the competingIRC boxes; and they can be supported by a lightweight soft-ware implementation based on well-known filtering tech-niques, with no additional requirements to be adopted otherthan a software upgrade of existing IRC systems.

Among the open issues in the area, the most importantis the lack of a stochastic model characterizing the distri-bution of path switches in a competitive environment.Studies like [5] have shown that randomized techniquesare effective in desynchronizing some route controllerswhen their measurement windows are sufficiently over-lapped; however, they cannot guarantee stability. Onlyafter characterizing the distribution of path switches will itbe possible to formally study the local and global stabilityaspects of competitive IRC. Furthermore, the proposalsand results described here apply to the optimization ofVoIP traffic, but the conception of blending randomizationwith an SRC strategy is general in scope so our work canbe extended to control other kinds of traffic flows concur-rently, as well as consider other performance metricsbesides the RTT.

References[1] A. Akella et al., “A Measurement-Based Analysis of Multihoming,” Proc.

ACM SIGCOMM, Karlsruhe, Germany, Aug. 2003.[2] Avaya, Inc., “Converged Network Analyzer.”[3] Cisco Systems, Inc., “Optimized Edge Routing.”[4] Internap Networks, Inc., “Flow Control Platform.”[5] R. Gao, C. Dovrolis, and E. W. Zegura, “Avoiding Oscillations Due to Intelli-

gent Route Control Systems,” Proc. IEEE INFOCOM 2006, Barcelona, Spain,Apr. 2006.

[6] C. Labovitz et al., “Delayed Internet Routing Convergence,” Proc. ACM SIG-COMM, Stockholm, Sweden, Aug. 2000.

[7] M. Yannuzzi, X. Masip-Bruin, and O. Bonaventure, “Open Issues in Interdo-main Routing: A Survey,” IEEE Network, vol. 19, no. 6, Nov.–Dec. 2005,pp. 49–56.

[8] A. Akella et al., “A Comparison of Overlay Routing and Multihoming RouteControl,” Proc. ACM SIGCOMM, Portland, OR, Aug. 2004.

[9] D. K. Goldenberg et al., “Optimizing Cost and Performance for Multihom-ing,” Proc. ACM SIGCOMM, Portland, OR, Aug. 2004.

[10] F. Guo et al., “Experiences in Building a Multihoming Load Balancing Sys-tem,” Proc. IEEE INFOCOM ’04, Hong Kong, China, Mar. 2004.

[11] A. Akella, S. Seshan, and A. Shaikh, “Multihoming Performance Benefits:An Experimental Evaluation of Practical Enterprise Strategies,” USENIXAnnual Technical Conf., Boston, MA, June 2004.

[12] M. Yannuzzi, “Strategies for Internet Route Control: Past, Present, andFuture,” Ph.D. diss., Tech. Unive. of Catalonia, Barcelona, Spain, 2007.

[13] M. Yannuzzi et al., “A Proposal for Inter-Domain QoS Routing Based onDistributed Overlay Entities and QBGP,” Proc. QoFIS ’04, LNCS 3266,Barcelona, Spain, Oct. 2004.

[14] J-Sim homepage; http://www.j-sim.org[15] A. Medina et al., “BRITE: An Approach to Universal Topology Generation,”

Proc. MASCOTS, Aug. 2001.[16] B. Waxman, “Routing of Multipoint Connections,” IEEE JSAC, Dec. 1988.

BiographiesMARCELO YANNUZZI ([email protected]) received a degree in electrical engi-neering from the University of the Republic (UdelaR), Uruguay, in 2001, andDEA (M.Sc.) and Ph.D. degrees in computer science from the Department ofComputer Architecture, Technical University of Catalonia (UPC), Spain, in 2005and 2007, respectively. He is with the Advanced Network Architectures Lab atUPC, where he is an assistant professor. He held previous positions with thePhysics Department of the School of Engineering, UdelaR, from 1997 to 2003,and with the Electrical Engineering Department of the same university from 2003until 2006. He worked in industry for 10 years at the national telco in Uruguay(1993–2003).

XAVI MASIP-BRUIN ([email protected]) received M.S. and Ph.D. degrees fromUPC, both in telecommunications engineering, in 1997 and 2003, respectively.He is currently an associate professor of computer science at UPC. His currentresearch interests are in broadband communications, QoS management andprovision, and traffic engineering. His publications include around 60 papers innational and international refereed journals and conferences. Since 2000 he hasparticipated in many research projects: IST projects E-NEXT, NOBEL, andEuQoS; and Spanish research projects SABA, SABA2, SAM, and TRIPODE.

EVA MARIN-TORDERA ([email protected]) received M.S. degrees in physics in 1993and electronic engineering in 1998, both from Barcelona University, and a Ph.D.from UPC in 2007, where she works as an assistant professor. She has pub-lished many papers in national and international conferences. Her main interestsfocus on QoS provisioning and optical networks. She is now actively participat-ing in the BONE and DICONET international projects, and in the national pro-ject CATARO.

JORDI DOMINGO-PASCUAL ([email protected]) is a full professor of computer sci-ence and communications at UPC. He is co-founder of and a researcher at theAdvanced Broadband Communications Center (CCABA) of the university. Hisresearch topics are broadband communications and applications, IP/ATM inte-gration, QoS management and provision, traffic engineering, IP traffic analysisand characterization, and QoS measurements.

ALEXANDRE FONTE ([email protected]) graduated in electrical engineering from theUniversity of Coimbra, Portugal, in 1995, and received his M.Sc. degree in elec-tronic and telecommunications engineering (distributed systems specialty) fromthe University of Aveiro, Portugal, in 2000. He is currently a Ph.D. student incomputer engineering at the Department of Informatics Engineering, University ofCoimbra. His Ph.D. research activity is focused on interdomain quality of servicerouting and traffic engineering in IP networks.

EDMUNDO MONTEIRO ([email protected]) is an associate professor at the Univer-sity of Coimbra, Portugal, from which he graduated in 1984 and received aPh.D. in electrical engineering (computer specialty) in 1995. His research interestsare computer communications, QoS, mobility, routing, resilience, and security.

YANNUZZI LAYOUT 9/9/08 12:35 PM Page 64