Upload
stanley-gubbins
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
1
Routing Economics under Big Data
Murat [email protected]
Computer Science and EngineeringUniversity of Nevada – Reno, USA
2
Outline
• Routing in a Nutshell• BigData Routing Problems
– Economic Granularity– Routing Scalability– Bottleneck Resolution
• Summary
3
URL: http://www.youtube.comIP Address: 74.125.224.169
IP Prefix: 74.125.224/24
NSHE
Broad-BandOne
AT&TLevel3 Pac-Net
Path: 1951-7018-3356-10026-15169
15169
10026-151693356-10026-15169
7018-3356-10026-15169
1951-7018-3356-10026-15169
Local
Regional
Tier-1 Tier-1
Routing in a Nutshell
4
Routing in a Nutshell
AT&T Cogent
Level3
NSHE SBC
Backbone ISP(Tier-1)
Regional ISPs
Local ISPs
Customer / ProviderPeer / Peer
Internet Core
Customer Cone
5
AT&T7018
Broad-BandOne1951
Level33356
• Inter-domain Routing among ISPs:• Single Metric (Number of hops, ISPs)• Partial network information• Scalable
• Intra-domain Routing within ISP network:• Multi-Metric (Delay, bandwidth, speed, packet loss rate …)• Computationally heavy, • Complete network information in terms of links• Not scalable for large networks
Routing in a Nutshell
6
NSHE
Broad-BandOne
AT&TLevel3 Pac-Net
A few Mb/s
What if the flow is big? Real big?
100+ Gb/s
Flow-aware Economics?
Negligible Flow Cost/Value
Big-DataBob
Big-DataAlice
7
ATT7018
NSHE3851
AnywhereLevel33356
• Point-to-Anywhere • Not automated, rigid SLA (6+ months…)• Transit service seen as commodity • Value sharing structure, edge gets all the money
Problem 1: Economic Granularity
8
Contract Routing Architecture• An ISP is abstracted as a set of
“contract links”• Contract link: an advertisable
contract– between peering/edge points i and j
of an ISP– with flexibility of advertising different
prices for edge-to-edge (g2g) intra-domain paths
• Contract components– performance component, e.g., capacity– financial component, e.g., price– time component, e.g., term
capability of managing value flows at a finer granularity
than point-to-anywhere deals
Global Internet 2008
9
G2G Set-of-Links Abstraction• Can change things
a lot even for small scenarios..
10
G2G Set-of-Links Abstraction• Max Throughput Routing
ICC 2012
Average over 50 Random topologies
11
G2G Set-of-Links Abstraction• Min Delay Routing
ICC 2012
Average over 50 Random topologies Average over 50 BRITE topologies
12
Path-Vector Contract Routing
User X
2
3
5
ISP A
ISP C
ISP B
1 4
[5, A-B, 1-2-4, 15-20Mb/s, 20-30mins, $4]
[5, A, 1-2, 15-30Mb/s, 15-30mins, $8]
[5, A, 1-3, 5-10Mb/s, 15-20mins, $7]
Paths to 5 are found and ISP C sends
replies to the user with two specific contract-
path-vectors.
path request path request
path request
[A-B-C, 1-2-4-5, 20Mb/s, 30mins]
[A-C, 1-3-5, 10Mb/s, 15mins]
Paths to 5 are found and ISP C sends
replies to the user with two specific contract-
path-vectors.
replyreply
reply
[5, 10-30Mb/s, 15-45mins, $10]
13
Results – Path ExplorationOver 80% path
exploration success ratio even at 50% discovery packet filtering thanks to
diversity of Internet routes.
With Locality, PVCR achieves near 100
percent path exploration success.
As budget increases with BTTL and
MAXFWD, PVCR becomes robust to
filteringGLOBECOM 2012
14
Results – Traffic Engineering
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
20
22
24
26
28
Pric
e
12
34
56
78
910
12
34
56
78
910
20
22
24
26
28
Pric
e
12
34
56
78
910
12
34
56
78
910
20
22
24
26
28
30
Pric
e
12
34
56
78
910
12
34
56
78
910
20
20.2
20.4
20.6
20.8
21
Pric
e
12
34
56
78
910
12
34
56
78
910
20
20.2
20.4
20.6
20.8
21
Pric
e
12
34
56
78
910
12
34
56
78
910
20
20.2
20.4
20.6
20.8
21
Pric
e
PVCR provides end-to-end coordination
mechanisms.
No hot-spots, network bottlenecks
ICC 2012
15
Problem 2: Routing Scalability• Routing scalability is a burning issue!
– Growing routing state and computational complexity• Timely lookups are harder to do• More control plane burden
– Growing demand on• Customizable routing (VPN)• Higher forwarding speeds• Path flexibility: policy, quality
16
• Cost of routing unit traffic is not scaling well• Specialized router designs are getting costlier,
currently > $40K
• BigData flows More packets at faster speeds.. • How to scale routing functionality to BigData levels?
Problem 2: Routing Scalability
17
• Cloud services are getting abundant– Closer:
• Delay to the cloud is reducing [CloudCmp, IMC’10]• Bandwidth to the cloud is increasing
– Cheaper:• CPU and memory are becoming commodity at the cloud• Cloud prices are declining
– Computational speedups via parallelization– Scalable resources, redundancy
Offload the Complexity to the Cloud?
18
• Goal: To mitigate the growing routing complexity to the cloud
• Research Question: If we maintain the full router functionality at the cloud but only partial at the router hardware, can we solve some of the routing scalability problems?
CAR: Cloud-Assisted Routing
Router X(Hardware with Partial Routing Functions)
Upd
ates
and
Pa
cket
s
Proxy Router X(Software with Full Routing Functions)
CAR
Rout
er X
Upd
ates
Cloud Providing CAR Services to Many Routers
Use the valuable router hardware for the most used prefixes and the most urgent computations. Amdahl’s Law in action!
19
CAR: An Architectural View
OpenFlow
Flexibility(# of configuration parameters)
Scal
abili
ty(p
acke
ts/s
ec)
CAR
Click
Mor
e Pl
atfor
m D
epen
denc
e
Finer Programmability
PacketShader[25]
Per Interface Per Flow Per Packet
Spec
ializ
ed H
WPu
re S
WH
ybrid
SW
/HW
RouteBricks[7]
RCP[8] Cisco CSR[10]
Specialized ASIC(Cisco Catalyst Series)
SwitchBlade[17]
NetFPGA
Barrier being pushed
20
BGP PeerEstablishment
Scenario
BGP Peer Establishment• 400K Prefix Exchanged (Full Table)• Takes approx. 4-5 minutes• Only 4K prefixes selected as best
path
BGP Peer Establishment w/ CAR• 4K prefixes provided to Routers• Outbound Route Filtering RFC 5291• Takes approx. 1-2 minutes• Only selected alternative paths out
of 400K installed later
Step 1: Table Exchange btw Proxies
Step 2: ORF List Exchange
BetweenRouters and
Proxies
Step 3: Only Selected
Prefixes Exchange Initially Btw
Routers
CAR: A sample BGP Peering Scenario
CAR’s CPU Principle: Keep the control plane closer to the cloud! Offload heavy computations to the cloud.
CAR: A sample BGP Peering Scenario
BGP Peer Establishment• 400K Prefix Exchanged (Full Table)• Takes approx. 4-5 minutes• Peak CPU Utilization• Only 4K prefixes selected as best path
Potential for 5x speed-up and 5x reduction of CPU
load during BGP peer establishment
22
Traffic
PartialFIB
PartialRIB
Temporal and Prefix Continuity / Spatiality
FullFIB
FullRIB
Regular Updates and Replacement
CAR: Caching and Delegation
23
TrafficPartial
FIBPartial
RIB
Miss (0.1%)
1st Option:Traffic into large buffers (150 ms)
Resolve next hop from Cloud Proxy
2nd Option:Reroute Traffic to Cloud Proxy
via Tunnels
Hit (99.9%)
Revisiting Route Caching: World should be Flat, PAM 2009• One tenth of the prefixes account for 97% of the traffic • One fourth of FIB can achieve 0.1 % miss rate• LRU Replacement of Cache
CAR: Caching and Delegation
CAR’s Memory Principle: Keep data plane closer to the router.Keep the packet forwarding operations at the router to the extent possible.
24
Problem 3: Bottleneck Resolution
• BigData flows Long time scalesSeveral hours
Dynamic network behavior
Moving bottlenecks
A few mins
Fixed network behavior
Fixed bottlenecks
We need to respond to network dynamics and resolve bottlenecks as the BigData flows run!
25
• Intra-Node Bottlenecks
Where is the Bottleneck?
Source End-system Destination End-system
Internet
NIC
Disk Disk Disk Disk
CPU
CPUNIC
DiskDiskDiskDisk
CPU
CPU
NIC
NIC
Multiple parallel streams with inter-node network optimizations, but ignoring intra-node bottlenecks
Truly end-to-end multiple parallel streams with joint intra- and inter-node network optimizations
Relay Node
Relay Node
26
• Quality-of-Service (QoS) Routing may help! But.. – NP Hard to configure optimally– Route flaps
• Multi-core CPUs are abundant– How to leverage them in networking? [CCR’11]
• Can we use them to parallelize the protocols?• Multiple instances of the same protocol• Collaborating with each other• Each instance working on a separate part of the network?
• A divide-and-conquer?• Should do it with minimal disruption Overlay
Leverage Multi-Core CPUs for Parallelism?
27
Parallel Routing
1
1
1
2
15 Mb/s
2
1
1
2
2
5 Mb/s
1
3
1
1
4
5 Mb/s
Substrate 1 Substrate 2 Substrate 3
10 Mb/s
5 Mb/s
5 Mb/s
10 Mb/s
5 Mb/s
A
B
C
D
28
• Nice! But, new complication: How to slice out the substrates?Parallel Routing
1
1
1
1
2
5 Mb/s
Substrate 1
1
1
1
25 Mb/s
Substrate 2
1
1
1
5 Mb/s
Substrate 3
10 Mb/s
5 Mb/s
5 Mb/s
10 Mb/s
5 Mb/s
A
B
C
D
A-C is maxed out
B-D is maxed out
29
• Economic Granularity– Finer, more flow-aware network architectures– An idea: Contract-Switching, Contract Routing
• Routing Scalability– Cheaper solutions to routers’ CPU and memory complexity– An idea: CAR
• Bottleneck Resolution– Complex algorithms to better resolve bottlenecks and respond to
network dynamics– An idea: Parallel Routing
Summary
30
Thank you!
Google “contract switching”
Project Websitehttp://www.cse.unr.edu/~yuksem/contract-switching.htm
THE END
31
Collaborators & Sponsors• Faculty
– Mona Hella ([email protected]), Rensselaer Polytechnic Institute– Nezih Pala ([email protected]), Florida International University
• Students– Abdullah Sevincer ([email protected]) (Ph.D.), UNR– Behrooz Nakhkoob ([email protected]) (Ph.D.), RPI– Michelle Ramirez ([email protected]) (B.S.), UNR
• Alumnus– Mehmet Bilgi ([email protected]) (Ph.D.), UC Corp.
AcknowledgmentsThis work was supported by the U.S. National Science Foundation under awards 0721452 and 0721612 and
DARPA under contract W31P4Q-08-C-0080
32
Computational Scenario
Peers
Internet
Cloud Proxy Routers
Cloud Assisted BGP Routers
Peers
1) Full Table Exchange
3) Partial Table Exchange
2) Outbound Route Filter
Exchange
2) Outbound Route Filter
Exchange
33
Delegation Scenario
Cloud Assisted Router
Cloud Proxy Router
Peers in an IXP
Internet
FIB Cache
Full FIB
Unresolved Traffic
Delegation
CacheUpdates
Cloud Proxy Router
34
Delegation Scenario
CAR ClickRouter
Proxy Click Router
Traffic Sink Nodes
EC2, N. Virginia
Traffic Generator
IP GRE Tunnels
Emulab, Utah
35
Delegation Scenario
• Cloud-Assisted Click Router– Packet Counters for
• Flows Forwarded to Cloud• Received Packets
– Prefix Based Miss Ratio– Modified Radix-Trie Cache for Forwarding Table – Router Controller
• Processing Cache Updates • Clock Based Cache Replacement Vector
36
• Random topology– Inter-domain and Intra-domain are random
• BRITE topology– BRITE model for inter-domain– Rocketfuel Topologies (ABILENE and GEANT) for
intra-domain• GTITM topology
– GTITM model for inter-domain– Rocketfuel Topologies (ABILENE and GEANT) for
intra-domain
Simulation Results
Forwarding Mechanisms
37
bTTL: How many copies of discovery packet will be made and forwarded? Provides caps on messaging cost.
dTTL: Time to Live, Hop-Count Limit
MAXFWD: Max. number of neighbors to be forwarded
Evaluation
• CAIDA, AS-level, Internet Topology as of January 2010 (33,508 ISPs)
• Trial with 10000 ISP Pair (src,dest), 101 times• With various ISP cooperation / participation
and packet filtering levels– NL: No local information used– L: Local information used (with various filtering)
• With no directional and policy improvements for base case (worst) performance
38
Results - Diversity
39
Tens of paths discovered favoring
multi-path routing and reliability schemes.
Results – Path Stretch
40
Results – Messaging Cost
41
Number of discovery packet copies is well
below theoretical bounds thanks to path-vector loop
prevention.
Results – Transmission Cost
42
43
Results - Reliability
44
Many Possibilities• Intra-cloud optimizations among routers
receiving the CAR service– Data Plane: Forwarding can be done in the cloud – Control Plane: Peering exchanges and routing
updates can be done in the cloud• Per-AS optimizations
– Data Plane: Pkts do not have to go back to the physical router until the egress point
– Control Plane: iBGP exchanges
45
Some Interesting Analogies?• High cloud-router delay
– CAR miss at the router Page Fault– Delegation is preferable
• Forward the pkt to the cloud proxy
• Low cloud-router delay– CAR miss at the router Cache Miss– Caching (i.e. immediate resolution) is preferable
• Buffer the pkt at the router and wait until the miss is resolved via the full router state at the cloud proxy
46
• Where is the bottleneck?
Intra-Node Bottlenecks
NYC
1Gb/s NIC0
NIC1
Disk0
Disk1
Internet
Miami
CPU0
CPU1
SFO
1Gb/s
100Mb/s
100Mb/s
50M
b/s
50Mb/s
file-to-NYC.dat
file-to-Miami.dat
47
• Where is the bottleneck?
Intra-Node Bottlenecks
NYC
1Gb/s
50 Mb/s
NIC0
NIC1
Disk0
Disk1
Network
Miami
CPU0
CPU1
SFO
1Gb/s
100Mb/s
100Mb/s
50M
b/s
50Mb/s
NYC
Miami
NIC 0
NIC 1
file-to-NYC.dat
file-to-Miami.dat
file-to-NYC.dat
file-to-Miami.dat
Inter-node Topology without Intra-node Visibility
100 Mb/s
100 Mb/s
75Mb/s
75Mb/s
50 Mb/sThe network’s routing algorithm finds the shortest paths to NYC and Miami with NIC 0 and NIC 1 being the exit points, respectively. However, the intra-node topology limits the effective transfer rates.
48
• Where is the bottleneck?
Intra-Node Bottlenecks
NYC
1Gb/s
75 Mb/s
NIC0
NIC1
Disk0
Disk1
Network
Miami
CPU0
CPU1
SFO
1Gb/s
100Mb/s
100Mb/s
50M
b/s
50Mb/s
NYC
Miami
NIC 0
NIC 1
file-to-NYC.dat
file-to-Miami.dat
file-to-NYC.dat
file-to-Miami.dat
Integrated Topology with Visible Intra-node Topology
100 Mb/s
100 Mb/s
75Mb/s
75Mb/s
75 Mb/s
Disk 0
Disk 1
100Mb/s
100Mb/s
50M
b/s
50Mb/s
When the intra-node topology is included in the calculation of shortest paths by the routing algorithm, it becomes possible to find better end-to-end combinations of flows for a higher aggregate rate.
49
CAR: An Architectural View
Routing As a Service(e.g., RCP)
Managing Routers from Cloud
(e.g., NEBULA, Cisco CSR)
Separation of Control & Forwarding Planes
(e.g., OpenFlow)
Parallel Architectures(e.g., RouteBricks)
Clustered Commodity Hardware
(e.g., Trellis, Google)
Specialized ASIC(e.g., Cisco)
Long Transition from Current State of Routing to Cloud-Integrated Next-Generation Future Internet
Cloud-Assisted
Routing (CAR)A middle-ground to realize the
architectural transition.
Technical Shortcomings
50
1
2
ISP A ISP B
AS-Path B-C-D: 45ms
AS-Path B-C-D: 35ms
Technical Shortcomings
51
A
B C
D
E
AS-Path B-C-D: 35ms
AS-Path B-E-D: 25ms
52
CAR: Caching and Delegation