Routing Scalability Dimitri Papadimitriou [email protected] Alcatel-Lucent Bell

Routing Scalability

Dimitri [email protected]

Alcatel-Lucent Bell

Today

200+ million domain delegations [Verisign] over a billion domain

names

350k routing paths (350k BGP routing

entries at each DFZ router) and 35k

autonomous systems

Name->Addr. resolution

IP Address-based routing

Distributed adaptive routing -BGP (selective push)

Question ?

Total number of files ?

Name -> Name resolution (???)

Domain Name-based routing (???)

File Name-based routing (???)

200+ million domain delegations [Verisign] over a billion domain

names

Routing Scaling Analysis

• Two dimensional problem– Problem 1: Addressing (amplification of address prefix de-

aggregation) – Problem 2: Routing (Inter-domain routing protocol (BGP)

limitations)

Problem 1: Addressing (Amplification of address prefix de-aggregation)

• Originally, host IP addresses were Provider Allocated (PA) and assigned based on network topological location

– Adoption in the mid 90's of Classless Inter-Domain Routing (CIDR) [RFC4632] to perform address aggregation was felt sufficient to handle address scaling

• Conditions to achieve efficient address aggregation and relatively small routing tables (tradeoff routing information aggregation vs granularity) are not met anymore [RFC4984]

• Deterioration root causes – Host mobility, site multi-homing (~25% of sites), traffic-engineering (prefix de-

aggregation)– RIR policy to allocate PI addresses (not topologically aggregatable) thus

making CIDR ineffective

Growth of routing table (routing protocol must not only scale with increasing network size) even if network itself would not be growing

Problem 2: Inter-domain routing protocol limitations

1. BGP Implementation and configuration: may be circumvented

2. BGP Routing algorithmic: shortest AS-path vector routing– BGP (as any path-vector routing): slow convergence due to uninformed path

exploration – BGP suffers from churn/overhead which increases load on routers due to

topological failures and traffic engineering (prefix de-aggregation)

3. BGP Protocol usage: policy-based routing (without policy distribution) – Intra-AS oscillations: MED-induced oscillations – Inter-AS oscillations: local preference over shortest AS-Path – Conflicting policy interactions

• Unintended stable state (wedgies) • Unintended unstable state (dispute wheels)

Internet Growth Rate• Traffic

– Traffic volume (per month): [8,9] Exabytes

– Traffic growth rate: 50% (+/- 5%) per year

• Routing tables size – Number of active Routing Table (RT)

entries: 345k (Sep.2010) – Growth rate: 15%-25% per year

• Autonomous Systems (AS)– Number of advertized AS: 35k

(Sep.2010)– Growth rate: 10% per year– Ratio ~ 10 IPv4 prefix per AS

• Characteristic AS-path length – Steady ~3.7

• AS transit interconnection degree: growing (2.56 – 2.60)

Number of AS advertised in BGP routing table 35.269

Ratio: prefix/AS ~ 10

Jan.1 2011 (low-end predictions)

- Size: [370,000;400,000] prefixes

- Update Rate: 2.8M prefix updates per day

- Withdrawal Rate: 1.6M withdrawals per day

- 550Mbytes Memory - 120% of 1.5Ghz processor

Source: BGP Routing Table Analysis Reports - http://bgp.potaroo.net

Growth of Active BGP Entries (from Jan’89 to Sep’10)

Jan.1 2006

– FIB Size: 176,000 prefixes

– Update Rate: 0.7M prefix updates / day

– Withdrawal Rate: 0.4M prefix withdrawals / dayJan.1 2009

- FIB size: [275,000;300,000] prefixes

- Update Rate: 1.7M prefix updates / day

- Withdrawal Rate: 0.9M withdrawals / day

Current Internet Growth Rate

• Dynamics BGP updates (routing convergence)– Between Jan.2006 and Jan.2009: prefix update and withdrawal rates per day

increased by a factor of about 2.25-2.5 [Huston07] • Average: 2-3 per sec. – Peak: O(1000) per sec.

– BGP suffers from churn which increases load on routers due to topological failures and traffic engineering (prefix de-aggregation)

– BGP’s path vector amplifies these problems (path exploration)

Relationship to AS topology• Meshed AS topology (average AS degree ~ 2.5-3) with high clustering

coefficient (~ 0.4)• BGP uninformed path exploration

– BGP listens without understanding (local BGP route selection)– BGP routing updates are not coordinated in space and time but rate limited

(MRAI timer) -> state coupling between topologically correlated BGP updates

Cycle -> Exploration

Cycle -> ExplorationCycle -> Exploration

Space segmentation: lasagne or spaghettis

ID LocatorIP Address Prefix

Name

PartitionHost-driven Network-driven

Abstraction relation 1:n

Net

wor

k

Relation m:n, m>n

Indirection

Ind

irect

ion

ID

Network layer vs Overlay routing

1. Either focus on technological limits (scalability, resiliency, stability, convergence, etc.) and operational limits (policing) of existing network-level routing

2. Or build an infrastructure-based overlay on top of existing IP network layer Additional layer of indirection

a

d

cb

1. Revisit network “routing functions”

Edge

In

Out

C

D

B

a

d

cb

2. Infrastructure-based overlay

AIn

Out

Network layer vs Overlay routing

Additional layer of indirection adds benefits such as customization, independence, and flexibility... but also detrimental effects

– Conflicting cross-layer interactions that impact overall network performance (amplified by selfish routing where individual user/overlay controls routing of infinitesimal amount of traffic to optimize its own performance without considering network-wide criteria)

– Scalability (rate x state) – Resiliency (user-initiated states) and security (interaction with user-

initiated states) – Genericity and evolvability

Note: the looser the coupling the higher the flexibility, the stronger the coupling the higher the performance (pick one) !

Effects of indirections“Any problem in computer science can be solved with another layer of indirection.” — David Wheeler

Multiple control mechanisms conflicting cross-layer interactions

(due to diff. performance objectives & contention)

Overlay Traffic

NOP

Overlay control

info

Overlay control

Decapsulation Encapsulation

Overlay fwd info

Open i/f Open i/f

TC Lookup

FIB

RIB

Packet in Packet out

Routing engine

Longest matching prefix

MF classifier

Indirection = (generic) infrastructure-based overlay routing

… “But that usually will create another problem.” — rest of the quote

Routing Scaling dependency on Addressing

• Topology-dependent: locator address structure designed specifically to enable “topological aggregation” to scale with routing system

• Topology-independent: addressing space used as flat ID to prevent topological changes (TCP impact) and provider renumbering impact

Address prefix assignment

Network

Topology independe

nt

Topology dependent

Addressing follows topology

Address = flat ID Address = Loc. ID

Host/site

Locator/Identifier (Loc/ID) Separation (1)

• Motivation: restore aggregatibility of routing states by "segmenting" the address space (hosts vs networks) and their respective allocation policy

– Loc/ID split using different numbering spaces for end-point identifiers (EID) block allocation per organization and Locators (RLOC) that are topology congruent and aggregatable

• Principles – Segmentation between topology independent endpoint identifier (= user

address space) and topology dependent locator (= network address space) – Resolution via distributed database (= mapping database) including info

necessary to translate hosts’ topology independent addresses (identifiers) to topology dependent addresses (locators)

– Traffic-driven at ingress "edge": forwarding entries preceded by ID-to-RLOC mapping entries (encapsulation) populated per incoming traffic arrival

– Memory-less at egress "edge": do not keep track of source of ID-to-RLOC mapping requests (in case of mapping change, initial requestor not directly updated)


• Host A (EID A) -> Host B (EID B)

NetworkNetwork

TCP

APP

Network

Network Network

TCP

APP

Network Network

Edge router (ITR): A -> B

Edge router (ETR): B -> A

Host A Host B

Edge router (ETR): A -> B

Edge router (ITR): B -> A

EID-to-RLOC lookup

EID-to-RLOC lookup

EID A

EID B

LOC B

LOC A

Edge A Edge B

RLOC lookup

Map Request <?, EID B>

Map Reply <RLOC B, EID B>


Main challenges• Responsiveness: to spatio-temporal properties of incoming

traffic (and variations) – Differential delay and/or drop of initial incoming packets -> effect on

transport layer (e.g. congestion and flow control)– Port scanning (EID-to-RLOC cache updates)

• Churn: effect of changes for "in-use" EID-to-RLOC mappings – Changes in EID reachability (at egress) -> effect on established flows– Asymmetric forwarding paths (dual edges)

• De-aggregation: EID block segmentation– EID sub-blocks decomposition and allocation to multiple RLOCs

(forwarding still longest-match prefix based)

...More fundamentally

• Locator ID Separation Protocol (LISP) is a form of name-independent routing using topology-unaware flat addressing running on top of name-dependent routing scheme– In addition, to maintain and update EID-to-RLOC tables, the network

maintains and updates a distributed database of EID-to-RLOC mappings ( indirection layer)

– Average scaling characteristics of name-independent routing schemes cannot be better than name-dependent ones

• Reason: name independent schemes are essentially name dependent schemes plus mapping tables and ID-to-RLOC name-resolution mechanism, which incur to both routing table size increase and stretch

• Bottom-line: LISP can thus not directly result into a global routing scalability improvement

Design Principles ?• End-to-end principle emphasizes

– Functional placement: guides placement & spatial distribution of functionality – Correctness and completeness: a (sub-)system should consider only functions

that can be completely and correctly implemented within it• Don’t implement a function at lower layers unless it can be completely and correctly

implemented at this level (relieve the burden from hosts) • Don’t rely on information or processing that’s not available along the data path as it

makes network layer more complicated (example: DNS) – Overall system cost-performance tradeoff

• If an application can implement a functionality correctly, implement it a lower layer only as performance enhancement but iff it does not impose burden on applications that do not require that functionality

• Don’t put application semantics in network: leads to loss of flexibility– Cannot change existing applications easily and cannot introduce new applications easily

• Fate sharing – The network does not maintain any state about the applicative data flows that

traverses the network (app-stateless nature of the network)• Minimum intervention principle

Which architectural alternative ?

• Net result: +1 layer (only ?)• Gain ? Does it improve

Information

Communication

cost x complexity perf. x functionality

• Erosion of the end-to-end principle: network-aware app's and application aware network

• End-to-end principle• Loose/weak coupling

Mediation

Fusion: info communication

• RFC 1925, Art.5

Communication

Communication

Information

Information

Documents

Routing Scalability Dimitri Papadimitriou [email protected] Alcatel-Lucent Bell