Download ppt - NIRA: A New Internet Routing Architecture

NIRA: A New Internet Routing Architecture

Xiaowei Yang

MIT CSAIL

[email protected]

Presented by:

Prasad

mailto:[email protected]

Why NIRA?

• Problems with today’s routing system• No user choice• Does not scale well

Continuing growth of global routing state No fault isolation

• NIRA solves these problems.

No User Choice!

• Unlike in the telephone system, users cannot choose wide area providers separately from local providers.• Pricing, quality of service, security…

• Wide area providers cannot offer differentiated services directly to users. • Quality of service

Duopoly/ Monopoly

VerizonComcast

Backbones

21 Million Broadband Subscribers in June 2003.

We Want to Let Users Choose Domain-Level Routes

• Our hypothesis:• User choice stimulates competition.• Competition fosters innovation.

• Validation requires market deployment.• NIRA: the technical foundation.

AT&T

Local ISP

UUNET

Continuing Growth of Global Routing State

• Real world requirements such as multi-homing are not well supported.

Courtesy of http://bgp.potaroo.net/

No Fault Isolation

• Local failure causes global routing update.

• Routing loop and packet drops happen in transient state.

• Routing convergence takes on the order of minutes.

Our Approach:

NIRA

Overview of NIRA

• A scalable architecture that gives users the ability to select routes.

• “User” is an abstract entity, e.g., software agent

• “Domain-level” choices• Encourage ISP competition• Individual domain’s decision to offer “router-

level” choices

R7

B4 B3

R4 R10 B2

R1 R3

N2 N3

B1

R2N18

R5

R6 R9 R8

N17

N16

N15

N14 N13 N11 N10

N8

N7

N6

N5

N4

N12

X

Design Overview (1): Route Discovery

N9

N1

core

Bob Alice

Cindy

Design Overview (2): Route Representation

R7

B4 B3

R4 R10 B2

R1 R3

N2 N3

B1

R2N18

R5

R6 R9 R8

N17

N16

N15

N14 N13 N11 N10

N8

N7

N6

N5

N4

N12N9

N1

core

Bob Alice

Cindy

Design Overview (3): Failure handling

R7

B4 B3

R4 R10 B2

R1 R3

N2 N3

B1

R2N18

R5

R6 R9 R8

N17

N16

N15

N14 N13 N11 N10

N8

N7

N6

N5

N4

N12N9

N1

core

Bob Alice

Cindy

• Will not discuss provider compensation in details.

R7

B4 B3

R4 R10 B2

R1 R3

N2 N3

B1

R2N18

R5

R6 R9 R8

N17

N16

N15

N14 N13 N11 N10

N8

N7

N6

N5

N4

N12N9

N1

core

Bob Alice

Cindy

• Addressing

• Route discovery

• Name-to-Route mapping

• Route failure handling

System Components of NIRA

NIRA’s Addressing

• Strict provider-rooted hierarchical addressing• An address represents a valid route to the

core.

B2

R1 R3

N1 N2 N3

Core1::/16 2::/16

1:1::/32 1:2::/32

1:3::/322:1::/32

1:1:1::/481:2:1::/48

1:2:2::/48

1:3:1::/482:1:1::/48

B1

R2

BobAlice

1:1:1::10001:2:1::1000

1:3:1::20002:1:1::2000

Why is NIRA’s Addressing Scalable?

• Financial factors limit the size of core.• Provider hierarchy is shallow.• A domain has a limited number of providers.

B2

R1 R3

N1 N2 N3

Core1::/16 2::/16

1:1::/32 1:2::/32

1:3::/322:1::/32

1:1:1::/481:2:1::/48

1:2:2::/48

1:3:1::/482:1:1::/48

B1

R2

BobAlice

1:1:1::10001:2:1::1000

1:3:1::20002:1:1::2000

Efficient Route Representation

• A source and a destination address unambiguously represent a common type of route.

• General routes may use source routing headers.

B2

R1 R3

N1 N2 N3

Core1::/16 2::/16

1:1::/32 1:2::/321:3::/322:1::/32

1:1:1::/481:2:1::/48 1:2:2::/48

1:3:1::/482:1:1::/48

B1

R2

BobAlice

1:1:1::10001:2:1::1000

1:3:1::20002:1:1::2000

Routers’ Forwarding Tables

• Uphill table: providers• Downhill table: customers, self• Bridge table: all others

B2

R1 R3

N1 N2 N3

Core1::/16 2::/16

1:1::/32 1:2::/321:3::/322:1::/32

1:1:1::/481:2:1::/48

1:2:2::/48

1:3:1::/482:1:1::/48

B1

R2

BobAlice

1:1:1::10001:2:1::1000

1:3:1::20002:1:1::2000

1::/16 B1

1:1:1::/48 N1

1:1::/96 self

Downhill table

Uphill table

Basic Forwarding Algorithm

1. Look up destination address in the downhill table. If no match,

2. Look up the source address in the uphill table.

B2

R1 R3

N1 N2 N3

1::/16 2::/16

1:1::/32 1:2::/321:3::/322:1::/32

1:1:1::/481:2:1::/48

1:2:2::/48

1:3:1::/482:1:1::/48

B1

R2

BobAlice

1:3:1::20002:1:1::2000

Core

1:1:1::10001:2:1::1000

1:1:1::1000

1:3:1::2000

up

down

• Addressing

• Route discovery• Topology Information Propagation Protocol

(TIPP)


• Failure handling


What does TIPP Do?

• Propagates addresses• Propagates “up-graph”: providers and their inter-

connections on a user’s routes to the core.

B2

R1 R3

N1 N2 N3

CoreB1

R2

BobAlice

1:1:1::10001:2:1::10002:2:1::1000

X

temporarily unusable

What is TIPP Like?

Supports scoped propagation• Provides a consistent view of network

A simple, proven to be correct algorithm [SG89] No sequence numbers, no periodic refreshments,

no timestamps

Why is TIPP Scalable? (1)

• Up-graph is small.

R7

B4 B3

R4 R10 B2

R1 R3

N2 N3

B1

R2N18

R5

R6 R9 R8

N17

N16

N15

N14 N13 N11 N10

N8

N7

N6

N5

N4

N12N9

N1

core

Bob Alice

Cindy

Why is TIPP Scalable? (2)

• Scoped propagation fault isolation

R7

B4 B3

R4 R10 B2

R1 R3

N2 N3

B1

R2N18

R5

R6 R9 R8

N17

N16

N15

N14 N13 N11 N10

N8

N7

N6

N5

N4

N12N9

N1

core

Bob Alice

Cindy

X

• Addressing

• Route discovery




Name-to-Route Lookup Service (NRLS)

• An enhanced DNS service

B2

R1 R3

N1 N2 N3

CoreB1

R2

BobAlice

1:1:1::10001:2:1::1000

Foo.com server

1:3:1::20002:1:1::2000

Alice.foo.com1:3:1::20002:1:1::2000

1:3:1::20002:1:1::2000Alice.foo.com1:3:1::20002:1:1::2000

• Addressing

• Route discovery protocol




How Route Failures are Handled

• A combination of TIPP notifications and router feedbacks or timeouts.

• Switching addresses to switch to a different route• HIP [Moskowitz04], SCTP [RFC 2960], TCP migrate

[Snoeren00]

B2

R1 R3

N1 N2 N3

B1

R2

BobAlice

X

Alice.foo.com1:3:1::20002:1:1::2000

Foo.com serverCore

X

1:1:1::10001:2:1::1000

NIRA Solves these Problems:

• User choice• Choosing addresses choosing routes

choosing providers

• Scalability• Modularized route discovery.• Constrained failure propagation.

Evaluation

Data Sets

• Domain-level topologies from BGP routing tables

• Inferred domain relationships [Gao00] [Subramanian02]

• Not completely accurate, but best practice.

• Data from 2001 to 2004 [Agarwal]

Evaluation of Scalability

• Methodology• Measure the amount of state each domain

keeps assuming NIRA Number of providers in the core Number of address prefixes Size of up-graphs Size of forwarding tables

• Conclusions• Scalable in practice

The Internet Continues to Grow

• In practice:• Level of provider hierarchy (h) is shallow.• A domain has a limited number (p) of providers.

• In theory: ph

Provider-rooted Hierarchical Addressing is Practical.

Up-graphs are Small.

• Analysis• Level of provider hierarchy: h; • Number of a domain’s providers: p; • Number of a domain’s peers: q;• i=1

h pi-1(p+q)

Evaluation of TIPP in Dynamic Networks

• Methodology• Packet-level simulations using sampled

topologies

• Conclusion• Low communication cost• Fast convergence

Communication Cost of TIPP is Low.

• Average: total messages (bytes) / link / failure• Maximum: max seen messages (bytes) over one

link / failure

Scalable: scoped propagation

Convergence: No message churning

TIPP Converges Fast

• Link delay uniformly distributes in [10ms, 110ms]. • Single failure convergence time is proportional to the

shortest path propagation delay.

Conclusion

• User choice• Choosing addresses choosing routes choosing

providers• Scalability

• Modularized route discovery.• Constrained failure propagation.

• Evaluation shows NIRA is practical.• Looking forward

• New provider compensation model• Stable routing with user choice• Deployment of NIRA

Core Routing Region is Scalable

• Financial factors limit the size of core.

Forwarding Tables

• No need to dynamically compute paths to reach a prefix.• Common case:

• Small• Analysis:

• Number of prefixes: r• Number of customers: c• Number of peers: q• r + r + r*c + q

TIPP in Dynamic Networks

• Simulation topologies• Pick random leaf domains• Recursively include their providers and peers

Workload Analysis of NRLS Update

• A fundamental tradeoff:• Topology change will cause address change• Root servers reside in top-level providers.

• Route record updates: mimic a renumbering event• How often can route update happen?

Route server processing time: 1ms per update 1000 updates / second / server 100,000 updates ! 100 seconds » 2 minutes Bandwidth: 5% of 100Mb/s for 100,000 users, 100 bytes per update, ! 625 updates / second ! 160 seconds » 3 minutes to update all users

• Route update causes: contractual or physical topology change Could be scheduled Allow for a grace period, say 30 minutes

• Conclusion: manageable

Architectural Problem in the Internet

B2B1

R1 R2 R3

N1 N2 N3

10.0.0.0/16 R3

Aggregation

10.0.0.0/24 N3

12.0.0.0/24 N1

12.0.0.0/

16 R1

12.0.0.0/24 R2 N1

Adds one entry into BGP tables

Hierarchy address:

Non-hierarchy address:

inter-domain prefix intra-domain addr

prefix domain id intra-domain addr

B2B1

R1 R2 R3

N1 N2 N3

a000::/16 b000::/16Core

a000:1::/32 a000:2::/32 a000:3::/32 b000:1::/32

a000:2:1::/48 a000:2:2::/48b000:1:1::/48

a000:3:1::/48a000:1:1::/48

a000:1:1::1000a000:2:1::10001111:N1::1000

b000:1:1::1000a000:3:1::10001111:N3::1000

a000:1:1::/96a000:2:1::/96

b000:1:1::/96a000:3:1::/96

Address Allocation• A hierarchical address prefix is a leased resource.

• survives connection breakdown and node failures.• periodic lease renewal.• de-allocated when lease expires.

B

R

N

a000::/16 B R Nopen

openrequest 216

request 0

add (a000:1::/32, 3 days)

Address allocation decision making

add (a000:1:1::/48, 1 day)

a000:1::/32

a000:1:1::/48

add ( )

Time

add (a000:1:1::/48, 1 day)add (a000:1::/32, 3 days)

Established

Edge Record Origination and Distribution

• One directional attributes exchanged during connection setup.

• Topology update procedure distributes edge records to neighbors.

B

R

N

a000::/16 B R Nopen

openrequest 216

request 0

add (a000:1::/32, 3 days)

a000:1::/32

a000:1:1::/48add ( )

Time

add (a000:1:1::/48, 1 day)

edge (B, R)edge (R, B)

edge (R, B)

Internal reachable B → R: a000:1::/32External reachable B → R: ε

Internal reachable B→ R: a000:1::/32External reachable B → R: εInternal reachable R → B: a000::/32External reachable R → B: *

Topology update procedure

A Sampled Topology

Ensuring Edge Record Consistency

• Common techniques • Sequence numbers (OSPF, IS-IS)

Flooding• Timestamps

Loosely synchronized clocks

• Modified Shortest Path Topology Algorithm (SPTA) [SG89]• Pros

No sequence numbers or timestamps Simple, proven correct

• Cons Computation cost per update. Communication cost in a theoretical worst case is unbounded.

N

R1 R2

(P1, P2, down) (P1, P2, up)

Scalable Route Discovery with Transit Policies

• Addressing must take policies into consideration.• A natural solution: provider-rooted hierarchical

addressing [Tsuchiya91] [Francis94]

ISP1

Net1 Net2

Backbone1

Net3

ISP2

Backbone2

Net4

X

peer peer

provider customer

Design Components and Requirements

• Design components:• Route discovery• Route representation• Route failure handling• Provider compensation

?1011101 ?

X $$$$

• Design requirements• Scalable• Efficient• Robust• Heterogeneous user choice• Practical provider

compensation

Design Overview of TIPP

• Design focus: simplicity and correctness• Address allocation: straightforward• Topology information propagation: tricky

OSPF (RFC 2328) ~ 244 pages, BGP (RFC 1771) ~ 57 pages

Policy-controlled topology propagation• Information hiding• Scope

• Key design decisions• Shortest Path Topology Algorithm [SG89]

simple, proven to be correct No sequence numbers or timestamps

• Not guaranteed to discover all possible routes

Policy-controlled Topology Propagation

• Edge record• Originated by a domain• Contains bidirectional attributes

and policy specification• An edge record is

propagated based on policies.

B2

R1 R3

N1 N2 N3

Core1::/16 2::/16

1:1::/32 1:2::/321:3::/322:1::/32

1:1:1::/481:2:1::/48

1:2:2::/48

1:3:1::/482:1:1::/48

B1

R2

BobAlice

1:1:1::10001:2:1::1000

1:3:1::20002:1:1::2000

Originator ID: R3Neighbor ID: B2Status: upInternal reachable (R3 → B2): (0, 2::/16)External reachable (R3 → B2): *Internal reachable (B2 → R3): (1, 2:1::/32)External reachable (B2 → R3): ε

X

Shortest Path Topology AlgorithmP2

P1

N

R1 R2

(N, R1, up)(N, R2, up)(R1,P1, up)(R2, P1, down)(P1, P2, up)

Network N’s summary of edge records

X

N

R1 R2

(P1, P2, down)(R2, P1, down)(N, R2, up)

(P1, P2, up)(R1, P1, up)(N, R1, up)

P1

X

P2

Trust news from the neighbor on the shortest path.N’s summary

Scalable Route Discovery

• Three system components1. A strict provider-rooted hierarchical addressing

scheme

2. TIPP: a user learns his addresses and topology information.

3. NRLS: a user learns destination’s addresses and optional topology information.

• Combining information from TIPP and NRLS, a user is able to select an initial route.

• TIPP: a new protocol in NIRA

Analysis of Strict Provider-rooted Hierarchical Addressing

• Provider hierarchy is shallow.• Pro: a scalable core routing region.• Pro: scalable route discovery

• A valid route to a provider implies a policy-valid route to its customers.• Con: topology-dependent addresses• Con: multiple address selection

B2

R1 R3

N1 N2 N3

Core1::/16 2::/16

1:1::/32 1:2::/32

1:3::/322:1::/32

1:1:1::/481:2:1::/482:2:1::/48 1:2:2::/48

1:3:1::/482:1:1::/48

B1

R2

BobAlice

1:1:1::10001:2:1::10002:2:1::1000

1:3:1::20002:1:1::2000

2:2::/32

Provider Compensation• Contractual agreements.• Users cannot use arbitrary routes.• Providers use policy checking to prevent illegitimate

route usage.• Direct business relationships

Common case: verify source address General case: packet filtering

• Indirect business relationship: end users pay remote providers Policy is made upon the originator or the consumer of a packet. An open and general problem. Working in progress with Jennifer

Mulligan and David Clark.

• Various billing schemes are possible.• Flat fee, usage-based billing.

Protocol Organization (TIPP)• Control plane

TIPP over a reliable transport connection

• Connection ManagementIdle

Connect Active

OpenSent

OpenConfirm

AddreqConfirm

AddrSynced

Established

open / address request

address request / address

address / topology

topology / keepalive

keepalive

TIPP in Action

B

R

N

1::/16 B R Nopen

openrequest 216

request 0

add 1:1::/32add 1:1:1::/48

1:1::/32

1:1:1::/48

Time

Established

(B, R) up

(R, B) up

(R, B) up

Topology update

Address allocation

add none

1:1:1::1000, ready to use

Evaluation Checklist• Scalability

• Size of core• Number of address prefixes• Number of link records• Size of forwarding tables

• Forwarding efficiency• Dynamic networks

• Communication cost of TIPP• Convergence time of TIPP• Route setup time with reactive failure

detection• Workload of NRLS

Scalable Route Discovery without Policy Routing

• A well-studied problem• At the hearts of scalable routing:

• An ingenious addressing scheme• Nice theoretical bound: O(logN); N: the total number of nodes• Cluster-based [KK77], landmark routing [Tsuchiya88]

1

2

3

1.1

1.2

3.1 3.2

2.1

2.2

3.2.1

3.1.1

2.1.1

1.1.1