NIRA: A New Internet Routing Architecture
Xiaowei Yang
MIT CSAIL
Presented by:
Prasad
Why NIRA?
• Problems with today’s routing system• No user choice• Does not scale well
Continuing growth of global routing state No fault isolation
• NIRA solves these problems.
No User Choice!
• Unlike in the telephone system, users cannot choose wide area providers separately from local providers.• Pricing, quality of service, security…
• Wide area providers cannot offer differentiated services directly to users. • Quality of service
Duopoly/ Monopoly
VerizonComcast
Backbones
21 Million Broadband Subscribers in June 2003.
We Want to Let Users Choose Domain-Level Routes
• Our hypothesis:• User choice stimulates competition.• Competition fosters innovation.
• Validation requires market deployment.• NIRA: the technical foundation.
AT&T
Local ISP
UUNET
Continuing Growth of Global Routing State
• Real world requirements such as multi-homing are not well supported.
Courtesy of http://bgp.potaroo.net/
No Fault Isolation
• Local failure causes global routing update.
• Routing loop and packet drops happen in transient state.
• Routing convergence takes on the order of minutes.
Our Approach:
NIRA
Overview of NIRA
• A scalable architecture that gives users the ability to select routes.
• “User” is an abstract entity, e.g., software agent
• “Domain-level” choices• Encourage ISP competition• Individual domain’s decision to offer “router-
level” choices
R7
B4 B3
R4 R10 B2
R1 R3
N2 N3
B1
R2N18
R5
R6 R9 R8
N17
N16
N15
N14 N13 N11 N10
N8
N7
N6
N5
N4
N12
X
Design Overview (1): Route Discovery
N9
N1
core
Bob Alice
Cindy
Design Overview (2): Route Representation
R7
B4 B3
R4 R10 B2
R1 R3
N2 N3
B1
R2N18
R5
R6 R9 R8
N17
N16
N15
N14 N13 N11 N10
N8
N7
N6
N5
N4
N12N9
N1
core
Bob Alice
Cindy
Design Overview (3): Failure handling
R7
B4 B3
R4 R10 B2
R1 R3
N2 N3
B1
R2N18
R5
R6 R9 R8
N17
N16
N15
N14 N13 N11 N10
N8
N7
N6
N5
N4
N12N9
N1
core
Bob Alice
Cindy
• Will not discuss provider compensation in details.
R7
B4 B3
R4 R10 B2
R1 R3
N2 N3
B1
R2N18
R5
R6 R9 R8
N17
N16
N15
N14 N13 N11 N10
N8
N7
N6
N5
N4
N12N9
N1
core
Bob Alice
Cindy
• Addressing
• Route discovery
• Name-to-Route mapping
• Route failure handling
System Components of NIRA
NIRA’s Addressing
• Strict provider-rooted hierarchical addressing• An address represents a valid route to the
core.
B2
R1 R3
N1 N2 N3
Core1::/16 2::/16
1:1::/32 1:2::/32
1:3::/322:1::/32
1:1:1::/481:2:1::/48
1:2:2::/48
1:3:1::/482:1:1::/48
B1
R2
BobAlice
1:1:1::10001:2:1::1000
1:3:1::20002:1:1::2000
Why is NIRA’s Addressing Scalable?
• Financial factors limit the size of core.• Provider hierarchy is shallow.• A domain has a limited number of providers.
B2
R1 R3
N1 N2 N3
Core1::/16 2::/16
1:1::/32 1:2::/32
1:3::/322:1::/32
1:1:1::/481:2:1::/48
1:2:2::/48
1:3:1::/482:1:1::/48
B1
R2
BobAlice
1:1:1::10001:2:1::1000
1:3:1::20002:1:1::2000
Efficient Route Representation
• A source and a destination address unambiguously represent a common type of route.
• General routes may use source routing headers.
B2
R1 R3
N1 N2 N3
Core1::/16 2::/16
1:1::/32 1:2::/321:3::/322:1::/32
1:1:1::/481:2:1::/48 1:2:2::/48
1:3:1::/482:1:1::/48
B1
R2
BobAlice
1:1:1::10001:2:1::1000
1:3:1::20002:1:1::2000
Routers’ Forwarding Tables
• Uphill table: providers• Downhill table: customers, self• Bridge table: all others
B2
R1 R3
N1 N2 N3
Core1::/16 2::/16
1:1::/32 1:2::/321:3::/322:1::/32
1:1:1::/481:2:1::/48
1:2:2::/48
1:3:1::/482:1:1::/48
B1
R2
BobAlice
1:1:1::10001:2:1::1000
1:3:1::20002:1:1::2000
1::/16 B1
1:1:1::/48 N1
1:1::/96 self
Downhill table
Uphill table
Basic Forwarding Algorithm
1. Look up destination address in the downhill table. If no match,
2. Look up the source address in the uphill table.
B2
R1 R3
N1 N2 N3
1::/16 2::/16
1:1::/32 1:2::/321:3::/322:1::/32
1:1:1::/481:2:1::/48
1:2:2::/48
1:3:1::/482:1:1::/48
B1
R2
BobAlice
1:3:1::20002:1:1::2000
Core
1:1:1::10001:2:1::1000
1:1:1::1000
1:3:1::2000
up
down
• Addressing
• Route discovery• Topology Information Propagation Protocol
(TIPP)
• Name-to-Route mapping
• Failure handling
System Components of NIRA
What does TIPP Do?
• Propagates addresses• Propagates “up-graph”: providers and their inter-
connections on a user’s routes to the core.
B2
R1 R3
N1 N2 N3
CoreB1
R2
BobAlice
1:1:1::10001:2:1::10002:2:1::1000
X
temporarily unusable
What is TIPP Like?
Supports scoped propagation• Provides a consistent view of network
A simple, proven to be correct algorithm [SG89] No sequence numbers, no periodic refreshments,
no timestamps
Why is TIPP Scalable? (1)
• Up-graph is small.
R7
B4 B3
R4 R10 B2
R1 R3
N2 N3
B1
R2N18
R5
R6 R9 R8
N17
N16
N15
N14 N13 N11 N10
N8
N7
N6
N5
N4
N12N9
N1
core
Bob Alice
Cindy
Why is TIPP Scalable? (2)
• Scoped propagation fault isolation
R7
B4 B3
R4 R10 B2
R1 R3
N2 N3
B1
R2N18
R5
R6 R9 R8
N17
N16
N15
N14 N13 N11 N10
N8
N7
N6
N5
N4
N12N9
N1
core
Bob Alice
Cindy
X
• Addressing
• Route discovery
• Name-to-Route mapping
• Failure handling
System Components of NIRA
Name-to-Route Lookup Service (NRLS)
• An enhanced DNS service
B2
R1 R3
N1 N2 N3
CoreB1
R2
BobAlice
1:1:1::10001:2:1::1000
Foo.com server
1:3:1::20002:1:1::2000
Alice.foo.com1:3:1::20002:1:1::2000
1:3:1::20002:1:1::2000Alice.foo.com1:3:1::20002:1:1::2000
• Addressing
• Route discovery protocol
• Name-to-Route mapping
• Failure handling
System Components of NIRA
How Route Failures are Handled
• A combination of TIPP notifications and router feedbacks or timeouts.
• Switching addresses to switch to a different route• HIP [Moskowitz04], SCTP [RFC 2960], TCP migrate
[Snoeren00]
B2
R1 R3
N1 N2 N3
B1
R2
BobAlice
X
Alice.foo.com1:3:1::20002:1:1::2000
Foo.com serverCore
X
1:1:1::10001:2:1::1000
NIRA Solves these Problems:
• User choice• Choosing addresses choosing routes
choosing providers
• Scalability• Modularized route discovery.• Constrained failure propagation.
Evaluation
Data Sets
• Domain-level topologies from BGP routing tables
• Inferred domain relationships [Gao00] [Subramanian02]
• Not completely accurate, but best practice.
• Data from 2001 to 2004 [Agarwal]
Evaluation of Scalability
• Methodology• Measure the amount of state each domain
keeps assuming NIRA Number of providers in the core Number of address prefixes Size of up-graphs Size of forwarding tables
• Conclusions• Scalable in practice
The Internet Continues to Grow
• In practice:• Level of provider hierarchy (h) is shallow.• A domain has a limited number (p) of providers.
• In theory: ph
Provider-rooted Hierarchical Addressing is Practical.
Up-graphs are Small.
• Analysis• Level of provider hierarchy: h; • Number of a domain’s providers: p; • Number of a domain’s peers: q;• i=1
h pi-1(p+q)
Evaluation of TIPP in Dynamic Networks
• Methodology• Packet-level simulations using sampled
topologies
• Conclusion• Low communication cost• Fast convergence
Communication Cost of TIPP is Low.
• Average: total messages (bytes) / link / failure• Maximum: max seen messages (bytes) over one
link / failure
Scalable: scoped propagation
Convergence: No message churning
TIPP Converges Fast
• Link delay uniformly distributes in [10ms, 110ms]. • Single failure convergence time is proportional to the
shortest path propagation delay.
Conclusion
• User choice• Choosing addresses choosing routes choosing
providers• Scalability
• Modularized route discovery.• Constrained failure propagation.
• Evaluation shows NIRA is practical.• Looking forward
• New provider compensation model• Stable routing with user choice• Deployment of NIRA
Core Routing Region is Scalable
• Financial factors limit the size of core.
Forwarding Tables
• No need to dynamically compute paths to reach a prefix.• Common case:
• Small• Analysis:
• Number of prefixes: r• Number of customers: c• Number of peers: q• r + r + r*c + q
TIPP in Dynamic Networks
• Simulation topologies• Pick random leaf domains• Recursively include their providers and peers
Workload Analysis of NRLS Update
• A fundamental tradeoff:• Topology change will cause address change• Root servers reside in top-level providers.
• Route record updates: mimic a renumbering event• How often can route update happen?
Route server processing time: 1ms per update 1000 updates / second / server 100,000 updates ! 100 seconds » 2 minutes Bandwidth: 5% of 100Mb/s for 100,000 users, 100 bytes per update, ! 625 updates / second ! 160 seconds » 3 minutes to update all users
• Route update causes: contractual or physical topology change Could be scheduled Allow for a grace period, say 30 minutes
• Conclusion: manageable
Architectural Problem in the Internet
B2B1
R1 R2 R3
N1 N2 N3
10.0.0.0/16 R3
Aggregation
10.0.0.0/24 N3
12.0.0.0/24 N1
12.0.0.0/
16 R1
12.0.0.0/24 R2 N1
Adds one entry into BGP tables
Hierarchy address:
Non-hierarchy address:
inter-domain prefix intra-domain addr
prefix domain id intra-domain addr
B2B1
R1 R2 R3
N1 N2 N3
a000::/16 b000::/16Core
a000:1::/32 a000:2::/32 a000:3::/32 b000:1::/32
a000:2:1::/48 a000:2:2::/48b000:1:1::/48
a000:3:1::/48a000:1:1::/48
a000:1:1::1000a000:2:1::10001111:N1::1000
b000:1:1::1000a000:3:1::10001111:N3::1000
a000:1:1::/96a000:2:1::/96
b000:1:1::/96a000:3:1::/96
Address Allocation• A hierarchical address prefix is a leased resource.
• survives connection breakdown and node failures.• periodic lease renewal.• de-allocated when lease expires.
B
R
N
a000::/16 B R Nopen
openrequest 216
request 0
add (a000:1::/32, 3 days)
Address allocation decision making
add (a000:1:1::/48, 1 day)
a000:1::/32
a000:1:1::/48
add ( )
Time
add (a000:1:1::/48, 1 day)add (a000:1::/32, 3 days)
Established
Edge Record Origination and Distribution
• One directional attributes exchanged during connection setup.
• Topology update procedure distributes edge records to neighbors.
B
R
N
a000::/16 B R Nopen
openrequest 216
request 0
add (a000:1::/32, 3 days)
a000:1::/32
a000:1:1::/48add ( )
Time
add (a000:1:1::/48, 1 day)
edge (B, R)edge (R, B)
edge (R, B)
Internal reachable B → R: a000:1::/32External reachable B → R: ε
Internal reachable B→ R: a000:1::/32External reachable B → R: εInternal reachable R → B: a000::/32External reachable R → B: *
Topology update procedure
A Sampled Topology
Ensuring Edge Record Consistency
• Common techniques • Sequence numbers (OSPF, IS-IS)
Flooding• Timestamps
Loosely synchronized clocks
• Modified Shortest Path Topology Algorithm (SPTA) [SG89]• Pros
No sequence numbers or timestamps Simple, proven correct
• Cons Computation cost per update. Communication cost in a theoretical worst case is unbounded.
N
R1 R2
(P1, P2, down) (P1, P2, up)
Scalable Route Discovery with Transit Policies
• Addressing must take policies into consideration.• A natural solution: provider-rooted hierarchical
addressing [Tsuchiya91] [Francis94]
ISP1
Net1 Net2
Backbone1
Net3
ISP2
Backbone2
Net4
X
peer peer
provider customer
Design Components and Requirements
• Design components:• Route discovery• Route representation• Route failure handling• Provider compensation
?1011101 ?
X $$$$
• Design requirements• Scalable• Efficient• Robust• Heterogeneous user choice• Practical provider
compensation
Design Overview of TIPP
• Design focus: simplicity and correctness• Address allocation: straightforward• Topology information propagation: tricky
OSPF (RFC 2328) ~ 244 pages, BGP (RFC 1771) ~ 57 pages
Policy-controlled topology propagation• Information hiding• Scope
• Key design decisions• Shortest Path Topology Algorithm [SG89]
simple, proven to be correct No sequence numbers or timestamps
• Not guaranteed to discover all possible routes
Policy-controlled Topology Propagation
• Edge record• Originated by a domain• Contains bidirectional attributes
and policy specification• An edge record is
propagated based on policies.
B2
R1 R3
N1 N2 N3
Core1::/16 2::/16
1:1::/32 1:2::/321:3::/322:1::/32
1:1:1::/481:2:1::/48
1:2:2::/48
1:3:1::/482:1:1::/48
B1
R2
BobAlice
1:1:1::10001:2:1::1000
1:3:1::20002:1:1::2000
Originator ID: R3Neighbor ID: B2Status: upInternal reachable (R3 → B2): (0, 2::/16)External reachable (R3 → B2): *Internal reachable (B2 → R3): (1, 2:1::/32)External reachable (B2 → R3): ε
X
Shortest Path Topology AlgorithmP2
P1
N
R1 R2
(N, R1, up)(N, R2, up)(R1,P1, up)(R2, P1, down)(P1, P2, up)
Network N’s summary of edge records
X
N
R1 R2
(P1, P2, down)(R2, P1, down)(N, R2, up)
(P1, P2, up)(R1, P1, up)(N, R1, up)
P1
X
P2
Trust news from the neighbor on the shortest path.N’s summary
Scalable Route Discovery
• Three system components1. A strict provider-rooted hierarchical addressing
scheme
2. TIPP: a user learns his addresses and topology information.
3. NRLS: a user learns destination’s addresses and optional topology information.
• Combining information from TIPP and NRLS, a user is able to select an initial route.
• TIPP: a new protocol in NIRA
Analysis of Strict Provider-rooted Hierarchical Addressing
• Provider hierarchy is shallow.• Pro: a scalable core routing region.• Pro: scalable route discovery
• A valid route to a provider implies a policy-valid route to its customers.• Con: topology-dependent addresses• Con: multiple address selection
B2
R1 R3
N1 N2 N3
Core1::/16 2::/16
1:1::/32 1:2::/32
1:3::/322:1::/32
1:1:1::/481:2:1::/482:2:1::/48 1:2:2::/48
1:3:1::/482:1:1::/48
B1
R2
BobAlice
1:1:1::10001:2:1::10002:2:1::1000
1:3:1::20002:1:1::2000
2:2::/32
Provider Compensation• Contractual agreements.• Users cannot use arbitrary routes.• Providers use policy checking to prevent illegitimate
route usage.• Direct business relationships
Common case: verify source address General case: packet filtering
• Indirect business relationship: end users pay remote providers Policy is made upon the originator or the consumer of a packet. An open and general problem. Working in progress with Jennifer
Mulligan and David Clark.
• Various billing schemes are possible.• Flat fee, usage-based billing.
Protocol Organization (TIPP)• Control plane
TIPP over a reliable transport connection
• Connection ManagementIdle
Connect Active
OpenSent
OpenConfirm
AddreqConfirm
AddrSynced
Established
open / address request
address request / address
address / topology
topology / keepalive
keepalive
TIPP in Action
B
R
N
1::/16 B R Nopen
openrequest 216
request 0
add 1:1::/32add 1:1:1::/48
1:1::/32
1:1:1::/48
Time
Established
(B, R) up
(R, B) up
(R, B) up
Topology update
Address allocation
add none
1:1:1::1000, ready to use
Evaluation Checklist• Scalability
• Size of core• Number of address prefixes• Number of link records• Size of forwarding tables
• Forwarding efficiency• Dynamic networks
• Communication cost of TIPP• Convergence time of TIPP• Route setup time with reactive failure
detection• Workload of NRLS
Scalable Route Discovery without Policy Routing
• A well-studied problem• At the hearts of scalable routing:
• An ingenious addressing scheme• Nice theoretical bound: O(logN); N: the total number of nodes• Cluster-based [KK77], landmark routing [Tsuchiya88]
1
2
3
1.1
1.2
3.1 3.2
2.1
2.2
3.2.1
3.1.1
2.1.1
1.1.1