View
215
Download
0
Category
Preview:
Citation preview
1
Would Diversity Really Increase the Robustness of the Routing Infrastructure
against Software Defects?
February 2008
Juan Caballero, Theocharis KampourisCarnegie Mellon
Dawn SongCarnegie Mellon & UC Berkeley
Jia WangAT&T Labs
The answer is: Yes
2
Software defects in routers
• Defects in router software not uncommon
• Multiple vulnerabilities in routers uncovered– DoS: maliciously crafted packets cause reload [CERT5]– DoS: maliciously crafted packets cause excessive
resource consumption [CERT2,CERT4]– Remote execution of system-level commands [CERT3]– Unauthorized privileged access [CERT1]– Possible remote shell execution [CERT7]
3
Simultaneous router failure
• Routing infrastructure highly homogeneous
• What if a software defect makes it possible to simultaneously take down many routers?
– Worst case scenario. Rare.– But, huge impact Highly damaging to ISP’s reputation
• Diversity– Multiple implementations from different code bases– Reduces number of nodes affected by a bug
[Zhang01,Junqueira05,O’Donnell04]
• But, how well would it work on routers?
4
Scope
• We focus on the effect on network connectivity– Impact on higher layers left as future work– Includes: routing convergence, packet loss, delay…
• Why?– Because no connectivity means no communication
• What about fundamental limitations of diversity?– Vulnerabilities that are shared among vendors
» General problem with no good solution
– Deployment cost» Depends on how much diversity is already available
5
Statement
• This paper does not claim:– that diversity can protect against all software defects– that we should redesign all networks to accommodate
for diversity
• Rather, we show:– that diversity greatly helps with simultaneous router
failures – that networks might already have a surprising amount
of diversity
• But, it is not used to increase the robustness!
6
Contributions
• Answering four fundamental questions:
1.How do we measure robustness of a network against simultaneous router failures?
2.How to best use the diversity?
3.How much diversity is needed to guarantee a certain degree of robustness?
4.Is there enough diversity already in the network or do we need to introduce more?
7
Problem definition
• Graph theoretic approach G = (V,E)– Nodes are routers (V), Edges are links (E)
• A version of a graph coloring problem where: – Colors represent implementations– A failure is a color removal– Different from well-known optimal coloring problem
• Network Robustness = Resilience to simultaneous router failure
– How connected is the network when multiple nodes fail?
• The goal is to assign a color to each router from a set of k available colors such that the network robustness (Φ) is maximized
8
Determining the best coloring
• Abilene network with 2 colors (k = 2)
Φ = 0.18
Φ = 0.42 Φ = 0.05
Φ = 0.23
We want to automatically select the best coloring
9
Outline
• Introduction
• Metrics
• Evaluation
• Algorithms
Connectivity Robustness
10
Metrics
• Need metrics to quantify the robustness of the colored graph the resilience to the failure
• We need two types of metrics:
1.Connectivity metrics: Given a graph determine how connected it is
– Many graph connectivity metrics already proposed– We select some existing ones
2.Robustness metrics: Given a colored graph determine how robust it is
– We propose new ones– The robustness metrics will be a function of the
connectivity metrics
11
Outline
• Introduction
• Metrics
• Evaluation
• Algorithms
Connectivity
Robustness
12
Connectivity metrics: NSLC
• Given a graph determine how connected it is• Normalized size of largest component (NSLC)
[Albert00]
graph in nodes ofNumber
componentlargest in nodes ofNumber NSLC
A
1 component
B
2 components
NSLC = 1 NSLC = 0.66
13
Connectivity metrics: PC
• Pair Connectivity (PC) [Park03]
A
1 component
B
2 components
PC= 1 PC = 0.33
2
)1|(|||21
1
n
LLPC
compi
iii
We have versions of the metrics that support node weights
14
Outline
• Introduction
• Metrics
• Evaluation
• Algorithms
Connectivity
Robustness
15
Robustness metrics
• Robustness of a colored graph measures the remaining connectivity when a color is removed
– Remove a color => Disconnect all nodes using the color
• Robustness is a function of the connectivity metric f applied over the diverse color-removal subgraphs
• Probability of failure of each color is unknown• Two metrics: average and minimum (worst-case)
ki
ik
avgfG Gfk 1
,
1 ikki
fG Gf
1
min, min
16
Minimum and average robustness
• Average robustness good• Minimum robustness bad• Average robustness can be misleading by itself
Robustness Metric (Φ) G2
Average Robustness (NSLC) 0.5
Minimum Robustness (NSLC) 0.18
G2
G2red
G2blue
NSLC=0.18
NSLC=0.82
17
Outline
• Introduction
• Metrics
• Evaluation
• Algorithms
18
Algorithms
• We have devised a total of 9 algorithms which can be classified into 4 families
• Only present the Region coloring algorithms in paper
– Rest are on the extended version [ColoringTR]
• Region coloring algorithms outperform others in evaluation
19
Region coloring algorithms
• Divide the network into contiguous regions• Regions are automatically found• Includes 2 algorithms: Cluster & Partition
– Algorithms accept number of regions (k) as input
• Graph partitioning algorithms try to balance the number of nodes in each partition (i.e., region)
Region 1 Region 2
20
Results overview• There is a trade-off usually between perfectly
balanced partitions and contiguous partitions• Results will show that:
1.Balanced regions are better
2.Slightly imbalanced but contiguous partitions are better than perfectly balanced but discontiguous partitions
Good partition
Region 1 Region 2
Region 1 Region 2
Region 1
Bad partition
21
Roles and Replicated nodes
• Roles:– Not all routers can use all implementations– Two roles: Access / Backbone– One color-set for each role– Nodes have roles and can only use implementations
from the color-set of their role
• Replicated nodes: – ISPs usually replicate important nodes
» Increases resilience against single node failures» Load-sharing
– In real networks, replicas are colored identically– For robustness, replicas need to be colored differently
22
Extended Partition Algorithm
1. Color all backbone routers– Create backbone graph by removing all access routers– First color replicas with different colors– Then color rest using partition algorithm
2. Color the access routers– Create the access graph by collapsing all backbone
nodes into a single node– Two cases depending on independence of access /
backbone implementations
23
Outline
• Introduction
• Metrics
• Evaluation
• Algorithms
24
Evaluation Setup
Topology Date Nodes Edges
Tier-1 ISP Oct. 2006 A few hundred A couple thousand
Cenic Aug. 2006 51 91
Abilene Sep. 2006 12 15
Exodus Jan. 2002 201 434
Sprint Jan. 2002 604 2268
Verio Jan. 2002 960 2821
Fully connected N/A 100 4950
Real
Rocketfuel
Synth.
• Metrics + algorithms implemented using the JUNG graph library [JUNG]
• Graph clustering algorithm from Wu et al. [Wu04]• Graph partition algorithm from Karypis et al. [Karypis00]
25
Coloring Algorithms: Setup
• Same topology (Tier-1 ISP) colored using different algorithms• Random as “lower bound”• Max as “upper bound”
26
Coloring Algorithms: Results
Partition/Cluster best on average• Region coloring minimizes impact
Partition best on worst case• More balanced coloring than Cluster
Partition performs close to Max in both average/worst cases• Non-contiguous partitions are bad (dip at k=5)
27
Redistributing the existing diversity
Metric Original coloring Extended Partition
Average 0.713 0.855
Minimum 0.055 0.760
• Tier-1 ISP contains 8 implementations (2 backbone, 6 access)─ Due to: legacy routers, vendor change, budget constraints
• Two implementations used by 90% of the nodes
• What happens if we redistribute the same diversity using our algorithms?
Number of nodes in largest component goes from 5% to 76%Requires:
1. Changing the number of nodes that use each implementation
2. Changing the geographical distribution of the implementations
28
Minimal diversity for decent robustness
Two colors are enough for the backbone• Most backbone routers are replicated
Decent robustness starts with 3 colors for access routersMore than 5 colors for access routers do not buy much
29
Related Work
• Diversity as solution against software defects – Diversity in all network layers [Zhang01]– Diversity in distributed systems [Junqueira05]– Diversity to slow malware propagation [O’Donnell04]
• Analysis of the Internet robustness[Albert00, Faloutsos99, Li04, Magoni03, Palmer01, Park03, Tangmunarunkit02, Zegura97]
• Analysis of failures in networks [Markopoulou04, NIST02]
• Router-level topologies [Spring02]
• Node Importance metrics [Freeman77, Lorrain71, Newman02, Tauro01]
• Clustering and Partitioning [Karypis00, Wu04, etc]
30
Conclusions1. How do we measure robustness of a network against
simultaneous router failures?
Proposed robustness metrics
2. How to use the diversity best?
Proposed coloring algorithms that achieve robustness close to the one obtained by a fully connected network
3. How much diversity is needed to guarantee a certain degree of robustness?
Not much. 2 backbone + 3 access for Tier-1 ISP
4. Is there enough diversity already in the network or do we need to introduce more?
Amount of diversity surprisingly high
Redistributing the diversity can increase the number of nodes surviving a failure from 5% to 76%
31
Questions?
Recommended