Upload
nicole-callahan
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Improving Internet Availabilitywith Path Splicing
Murtaza MotiwalaNick Feamster
Santosh Vempala
2
“It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. Over time, our list will evolve. It should be:
1. Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today.
2. …
Availability
3
Availability of Other Services
• Carrier Airlines (2002 FAA Fact Book)– 41 accidents, 6.7M departures– 99.9993% availability
• 911 Phone service (1993 NRIC report +)– 29 minutes per year per line– 99.994% availability
• Std. Phone service (various sources)– 53+ minutes per line per year– 99.99+% availability
4
Can the Internet Be “Always On”?
• Various studies (Paxson, etc.) show the Internet is at about 2.5 “nines”
• More “critical” (or at least availability-centric) applications on the Internet
• At the same time, the Internet is getting more difficult to debug– Increasing scale, complexity, disconnection, etc.
Is it possible to get to “5 nines” of availability?If so, how?
5
Availability: Two Aspects
• Reliability: Connectivity in the routing tables should approach the that of the underlying graph– If two nodes s and t remain connected in the
underlying graph, there is some sequence of hops in the routing tables that will result in traffic
• Recovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path
6
Where Today’s Protocols Stand
• Reliability: Routing protocols are single path.– When a link or node failure occurs, routers must
recompute new paths to each destination– Approach: Compute backup paths– Challenge: Many possible failure scenarios!
• Recovery: Today’s Internet routing protocols– Meanwhile, packets are dropped, reordered, etc.– Approach: Switch to a backup when a failure occurs– Challenge: Must quickly discover a new working path
7
Multipath: Promise and Problems
• Bad: If any link fails on both paths, s is disconnected from t
• Want: End systems remain connected unless the underlying graph has a cut
ts
8
Path Splicing: Main Idea
• Step 1 (Perturbations): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration
• Step 2 (Slicing): Allow traffic to switch between instances at any node in the protocol
ts
Compute multiple forwarding trees per destination.Allow packets to switch slices midstream.
9
Outline• Path Splicing
– Achieving Reliabile Connectivity• Mechanism #1: Random Perturbations• Mechanism #2: Network Slicing
– Forwarding– Recovery
• Properties– High Reliability– Bounded Stretch– Fast recovery
• Ongoing Work
10
Mechanism #1: Perturbations
• Goal: Each instance provides different paths• Mechanism: Each edge is given a weight that is
a slightly perturbed version of the original weight– Two schemes: Uniform and degree-based
ts
3
3
3
“Base” Graph
ts
3.5
4
5 1.5
1.5
1.25
Perturbed Graph
11
How to Perturb the Link Weights?
• Uniform: Perturbation is a function of the initial weight of the link
• Degree-based: Perturbation is a linear function of the degrees of the incident nodes– Intuition: Deflect traffic away from nodes where traffic
might tend to pass through by default
12
Mechanism #2: Network Slicing
• Goal: Allow multiple instances to co-exist• Mechanism: Virtual forwarding tables
a
t
c
s b
t a
t c
Slice 1
Slice 2
dst next-hop
13
Forwarding Traffic
• Packet has shim header with forwarding bits
• Routers use lg(k) bits to index forwarding tables– Shift bits after inspection
• To access different (or multiple) paths, end systems simply change the forwarding bits– Incremental deployment is trivial– Persistent loops cannot occur
14
Putting It Together
• End system sets forwarding bits in packet header• Forwarding bits specify slice to be used at any hop• Router: examines/shifts forwarding bits, and forwards
ts
15
A Definition Motivated by Reliability
• Reliability: the probability that, upon failing each edge with probability p, the graph remains connected
• Reliability curve: the fraction of source-destination pairs that remain connected for various link failure probabilities p
• The underlying graph has an underlying reliability (and reliability curve)– Goal: Reliability of routing system should approach that of the underlying graph.
16
Reliability Curve: Illustration
Probability of link failure (p)
Fraction of source-dest pairs disconnected
Better reliability
More edges available to end systems -> Better reliability
17
Reliability Approaches Optimal• Sprint (Rocketfuel) topology• 1,000 trials• p indicates probability edge was removed from base graph
Reliability approaches optimal
Average stretch is only 1.3
Sprint topology,degree-based perturbations
18
Recovery is Fast
• Which paths can be recovered within 5 trials?– Sequential trials: 5 round-trip times– …but trials could also be made in parallel
Recovery approaches maximum possible
Adding a few more slices improves recovery beyond best possible reliability with fewer slices.
19
Stretch is Bounded
• Stretch: How much longer is the path taken by packets over the “optimal” path?– Stretch is bounded in one slice by amount of perturbation– …but what about the stretch of spliced paths?– As long as “significant progress” (a large fraction of the
distance to d) is achieved for each hop, stretch bounded
Implication: Loops are rare.
20
Summary: Splicing Improves Availability
• Reliability: Connectivity in the routing tables should approach the that of the underlying graph– Approach: Overlay trees generated using random link-weight
perturbations. Allow traffic to switch between them.– Result: Splicing ~ 10 trees achieves near-optimal reliability
• Recovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path– Approach: End nodes randomly select new bits.– Result: Recovery within five trials approaches best possible.
21
Open Questions and Future Work
• How does splicing interact with traffic engineering? Sources controlling traffic?
• What are the best mechanisms for reliability and recovery?
• What changes are required to today’s routers to make splicing possible?
• Can splicing eliminate dynamic routing?
22
Variation: BGP Splicing• Observation: Many routers already learn multiple
alternate routes to each destination.• Idea: Use the forwarding bits to index into these
alternate routes at an AS’s ingress and egress routers.
• Storing multiple entries per prefix • Indexing into them based on packet headers• Selecting the “best” k routes for each destination
Required new functionality
ddefault
alternate
Splice paths at ingress and egress routers
23
Conclusion• Simple: Forwarding bits provide access to
different paths through the network
• Scalable: Exponential increase in available paths, linear increase in state
• Stable: Fast recovery does not require fast routing protocols
• No modifications to existing routing protocols
http://www.cc.gatech.edu/~feamster/papers/splicing-hotnets.pdf
24
25
History: Network Embedding
• Given: virtual (V) and physical (P) network– Topology, constraints, etc.
• Problem: find the appropriate mapping onto available physical resources (nodes and edges)
• Idea: Define a virtual graph G’ onto which G can be embedded
• A link in G can be mapped to multiple links in G’• How to forward traffic over multiple links in G’?• …
26
Possible Applications/Future Work
• Fast recovery from poorly performing paths
• Data transfer with easy multi-path– Overlay networks, CDNs, etc.– Transfer of video with multiple description
• Security applications
• Spatial diversity in wireless networks
27
Significant Novelty for Modest Stretch
• Novelty: difference in nodes in a perturbed shortest path from the original shortest path
Example
s d
Novelty: 1 – (1/3) = 2/3
Fraction of edges on short path shared with long path
28
Related Work
• Pre-Computed Backup Paths– Multi-Topology Routing– Multiple Router Configuration– MPLS Fast Reroute
• End-Node Controlled Traffic– Source routing– Routing deflections
• Multipath routing (ECMP, MIRO, etc.)• IGP link-weight optimization• Measurement of path diversity and multihoming• Layer-3 VPNs
29
Other Properties• Scalable
– Exponential increase in paths, linear increase in state
• Fast recovery from underlying failures
• Automatic tuning (e.g., for traffic engineering)– Perturbations achieve property of automatically spreading
traffic across different links– Standard link-weight optimization is potentially brittle in the
face of link failures
• Incrementally deployable
30
Prototype Implementation
• Click and Quagga on PL-VINI– http://www.vini-veritas.net/
Control Plane
ForwardingTable
Daemon
Classifier
Control Plane
ForwardingTable
Daemon
31
Loops, Reconsidered
• Problem: Potential for loops between ASes– AS-level loops can be longer than intra-AS loops
• Two possible approaches– Detection: routers mark packets and determine that
packets have traversed the same AS twice– Prevention: Exploit “common” routing policies to
ensure that packets are only deflected along valley-free paths
32
Preventing Inter-AS Loops with Policy
Observation: inter-AS loops inherently involve traversal that violates valley-free
Constraints: 1. once a “down” deflection has occurred, do not deflect 2. only allow one “across” deflection
Possible relaxation: allow a limited number of violations, specified by source
34
Definitions of Path Diversity
• Connectivity: Minimum number of edges whose failure disconnects the graph (min cut)
• Expansion: Intuitively, small cuts disconnect small groups of nodes from the graph
35
Design Goals
• Reachability: allow endpoints to communicate • High Diversity: expose paths to end hosts that survive
failures– Capacity: the total available data rate between each source-
destination pair should be high– Fault tolerance: the number of disjoint paths should be high,
and the network should remain connected under failures
• Low Stretch: paths should not be too circuitous• Scalability: scale to a large number of networks,
destinations, routers, etc.
Today’s routing protocols do not exploit the diversity of the underlying network graph