Routing Economics under Big Data Murat Yuksel yuksem@cse.unr.edu Computer Science and Engineering...

Preview:

Citation preview

1

Routing Economics under Big Data

Murat Yukselyuksem@cse.unr.edu

Computer Science and EngineeringUniversity of Nevada – Reno, USA

2

Outline

• Routing in a Nutshell• BigData Routing Problems

– Economic Granularity– Routing Scalability– Bottleneck Resolution

• Summary

3

URL: http://www.youtube.comIP Address: 74.125.224.169

IP Prefix: 74.125.224/24

NSHE

Broad-BandOne

AT&TLevel3 Pac-Net

Google

Path: 1951-7018-3356-10026-15169

15169

10026-151693356-10026-15169

7018-3356-10026-15169

1951-7018-3356-10026-15169

Local

Regional

Tier-1 Tier-1

Routing in a Nutshell

4

Routing in a Nutshell

AT&T Cogent

Level3

NSHE SBC

Backbone ISP(Tier-1)

Regional ISPs

Local ISPs

Customer / ProviderPeer / Peer

Internet Core

Customer Cone

5

AT&T7018

Broad-BandOne1951

Level33356

• Inter-domain Routing among ISPs:• Single Metric (Number of hops, ISPs)• Partial network information• Scalable

• Intra-domain Routing within ISP network:• Multi-Metric (Delay, bandwidth, speed, packet loss rate …)• Computationally heavy, • Complete network information in terms of links• Not scalable for large networks

Routing in a Nutshell

6

NSHE

Broad-BandOne

AT&TLevel3 Pac-Net

Google

A few Mb/s

What if the flow is big? Real big?

100+ Gb/s

Flow-aware Economics?

Negligible Flow Cost/Value

Big-DataBob

Big-DataAlice

7

ATT7018

NSHE3851

AnywhereLevel33356

• Point-to-Anywhere • Not automated, rigid SLA (6+ months…)• Transit service seen as commodity • Value sharing structure, edge gets all the money

Problem 1: Economic Granularity

8

Contract Routing Architecture• An ISP is abstracted as a set of

“contract links”• Contract link: an advertisable

contract– between peering/edge points i and j

of an ISP– with flexibility of advertising different

prices for edge-to-edge (g2g) intra-domain paths

• Contract components– performance component, e.g., capacity– financial component, e.g., price– time component, e.g., term

capability of managing value flows at a finer granularity

than point-to-anywhere deals

Global Internet 2008

9

G2G Set-of-Links Abstraction• Can change things

a lot even for small scenarios..

10

G2G Set-of-Links Abstraction• Max Throughput Routing

ICC 2012

Average over 50 Random topologies

11

G2G Set-of-Links Abstraction• Min Delay Routing

ICC 2012

Average over 50 Random topologies Average over 50 BRITE topologies

12

Path-Vector Contract Routing

User X

2

3

5

ISP A

ISP C

ISP B

1 4

[5, A-B, 1-2-4, 15-20Mb/s, 20-30mins, $4]

[5, A, 1-2, 15-30Mb/s, 15-30mins, $8]

[5, A, 1-3, 5-10Mb/s, 15-20mins, $7]

Paths to 5 are found and ISP C sends

replies to the user with two specific contract-

path-vectors.

path request path request

path request

[A-B-C, 1-2-4-5, 20Mb/s, 30mins]

[A-C, 1-3-5, 10Mb/s, 15mins]

Paths to 5 are found and ISP C sends

replies to the user with two specific contract-

path-vectors.

replyreply

reply

[5, 10-30Mb/s, 15-45mins, $10]

13

Results – Path ExplorationOver 80% path

exploration success ratio even at 50% discovery packet filtering thanks to

diversity of Internet routes.

With Locality, PVCR achieves near 100

percent path exploration success.

As budget increases with BTTL and

MAXFWD, PVCR becomes robust to

filteringGLOBECOM 2012

14

Results – Traffic Engineering

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

10

20

22

24

26

28

Pric

e

12

34

56

78

910

12

34

56

78

910

20

22

24

26

28

Pric

e

12

34

56

78

910

12

34

56

78

910

20

22

24

26

28

30

Pric

e

12

34

56

78

910

12

34

56

78

910

20

20.2

20.4

20.6

20.8

21

Pric

e

12

34

56

78

910

12

34

56

78

910

20

20.2

20.4

20.6

20.8

21

Pric

e

12

34

56

78

910

12

34

56

78

910

20

20.2

20.4

20.6

20.8

21

Pric

e

PVCR provides end-to-end coordination

mechanisms.

No hot-spots, network bottlenecks

ICC 2012

15

Problem 2: Routing Scalability• Routing scalability is a burning issue!

– Growing routing state and computational complexity• Timely lookups are harder to do• More control plane burden

– Growing demand on• Customizable routing (VPN)• Higher forwarding speeds• Path flexibility: policy, quality

16

• Cost of routing unit traffic is not scaling well• Specialized router designs are getting costlier,

currently > $40K

• BigData flows More packets at faster speeds.. • How to scale routing functionality to BigData levels?

Problem 2: Routing Scalability

17

• Cloud services are getting abundant– Closer:

• Delay to the cloud is reducing [CloudCmp, IMC’10]• Bandwidth to the cloud is increasing

– Cheaper:• CPU and memory are becoming commodity at the cloud• Cloud prices are declining

– Computational speedups via parallelization– Scalable resources, redundancy

Offload the Complexity to the Cloud?

18

• Goal: To mitigate the growing routing complexity to the cloud

• Research Question: If we maintain the full router functionality at the cloud but only partial at the router hardware, can we solve some of the routing scalability problems?

CAR: Cloud-Assisted Routing

Router X(Hardware with Partial Routing Functions)

Upd

ates

and

Pa

cket

s

Proxy Router X(Software with Full Routing Functions)

CAR

Rout

er X

Upd

ates

Cloud Providing CAR Services to Many Routers

Use the valuable router hardware for the most used prefixes and the most urgent computations. Amdahl’s Law in action!

19

CAR: An Architectural View

OpenFlow

Flexibility(# of configuration parameters)

Scal

abili

ty(p

acke

ts/s

ec)

CAR

Click

Mor

e Pl

atfor

m D

epen

denc

e

Finer Programmability

PacketShader[25]

Per Interface Per Flow Per Packet

Spec

ializ

ed H

WPu

re S

WH

ybrid

SW

/HW

RouteBricks[7]

RCP[8] Cisco CSR[10]

Specialized ASIC(Cisco Catalyst Series)

SwitchBlade[17]

NetFPGA

Barrier being pushed

20

BGP PeerEstablishment

Scenario

BGP Peer Establishment• 400K Prefix Exchanged (Full Table)• Takes approx. 4-5 minutes• Only 4K prefixes selected as best

path

BGP Peer Establishment w/ CAR• 4K prefixes provided to Routers• Outbound Route Filtering RFC 5291• Takes approx. 1-2 minutes• Only selected alternative paths out

of 400K installed later

Step 1: Table Exchange btw Proxies

Step 2: ORF List Exchange

BetweenRouters and

Proxies

Step 3: Only Selected

Prefixes Exchange Initially Btw

Routers

CAR: A sample BGP Peering Scenario

CAR’s CPU Principle: Keep the control plane closer to the cloud! Offload heavy computations to the cloud.

CAR: A sample BGP Peering Scenario

BGP Peer Establishment• 400K Prefix Exchanged (Full Table)• Takes approx. 4-5 minutes• Peak CPU Utilization• Only 4K prefixes selected as best path

Potential for 5x speed-up and 5x reduction of CPU

load during BGP peer establishment

22

Traffic

PartialFIB

PartialRIB

Temporal and Prefix Continuity / Spatiality

FullFIB

FullRIB

Regular Updates and Replacement

CAR: Caching and Delegation

23

TrafficPartial

FIBPartial

RIB

Miss (0.1%)

1st Option:Traffic into large buffers (150 ms)

Resolve next hop from Cloud Proxy

2nd Option:Reroute Traffic to Cloud Proxy

via Tunnels

Hit (99.9%)

Revisiting Route Caching: World should be Flat, PAM 2009• One tenth of the prefixes account for 97% of the traffic • One fourth of FIB can achieve 0.1 % miss rate• LRU Replacement of Cache

CAR: Caching and Delegation

CAR’s Memory Principle: Keep data plane closer to the router.Keep the packet forwarding operations at the router to the extent possible.

24

Problem 3: Bottleneck Resolution

• BigData flows Long time scalesSeveral hours

Dynamic network behavior

Moving bottlenecks

A few mins

Fixed network behavior

Fixed bottlenecks

We need to respond to network dynamics and resolve bottlenecks as the BigData flows run!

25

• Intra-Node Bottlenecks

Where is the Bottleneck?

Source End-system Destination End-system

Internet

NIC

Disk Disk Disk Disk

CPU

CPUNIC

DiskDiskDiskDisk

CPU

CPU

NIC

NIC

Multiple parallel streams with inter-node network optimizations, but ignoring intra-node bottlenecks

Truly end-to-end multiple parallel streams with joint intra- and inter-node network optimizations

Relay Node

Relay Node

26

• Quality-of-Service (QoS) Routing may help! But.. – NP Hard to configure optimally– Route flaps

• Multi-core CPUs are abundant– How to leverage them in networking? [CCR’11]

• Can we use them to parallelize the protocols?• Multiple instances of the same protocol• Collaborating with each other• Each instance working on a separate part of the network?

• A divide-and-conquer?• Should do it with minimal disruption Overlay

Leverage Multi-Core CPUs for Parallelism?

27

Parallel Routing

1

1

1

2

15 Mb/s

2

1

1

2

2

5 Mb/s

1

3

1

1

4

5 Mb/s

Substrate 1 Substrate 2 Substrate 3

10 Mb/s

5 Mb/s

5 Mb/s

10 Mb/s

5 Mb/s

A

B

C

D

28

• Nice! But, new complication: How to slice out the substrates?Parallel Routing

1

1

1

1

2

5 Mb/s

Substrate 1

1

1

1

25 Mb/s

Substrate 2

1

1

1

5 Mb/s

Substrate 3

10 Mb/s

5 Mb/s

5 Mb/s

10 Mb/s

5 Mb/s

A

B

C

D

A-C is maxed out

B-D is maxed out

29

• Economic Granularity– Finer, more flow-aware network architectures– An idea: Contract-Switching, Contract Routing

• Routing Scalability– Cheaper solutions to routers’ CPU and memory complexity– An idea: CAR

• Bottleneck Resolution– Complex algorithms to better resolve bottlenecks and respond to

network dynamics– An idea: Parallel Routing

Summary

30

Thank you!

Google “contract switching”

Project Websitehttp://www.cse.unr.edu/~yuksem/contract-switching.htm

THE END

31

Collaborators & Sponsors• Faculty

– Mona Hella (hellam@ecse.rpi.edu), Rensselaer Polytechnic Institute– Nezih Pala (palan@fiu.edu), Florida International University

• Students– Abdullah Sevincer (asev@cse.unr.edu) (Ph.D.), UNR– Behrooz Nakhkoob (nakhkb@rpi.edu) (Ph.D.), RPI– Michelle Ramirez (beemyladybug1@yahoo.com) (B.S.), UNR

• Alumnus– Mehmet Bilgi (mbilgi@cse.unr.edu) (Ph.D.), UC Corp.

AcknowledgmentsThis work was supported by the U.S. National Science Foundation under awards 0721452 and 0721612 and

DARPA under contract W31P4Q-08-C-0080

32

Computational Scenario

Peers

Internet

Cloud Proxy Routers

Cloud Assisted BGP Routers

Peers

1) Full Table Exchange

3) Partial Table Exchange

2) Outbound Route Filter

Exchange

2) Outbound Route Filter

Exchange

33

Delegation Scenario

Cloud Assisted Router

Cloud Proxy Router

Peers in an IXP

Internet

FIB Cache

Full FIB

Unresolved Traffic

Delegation

CacheUpdates

Cloud Proxy Router

34

Delegation Scenario

CAR ClickRouter

Proxy Click Router

Traffic Sink Nodes

EC2, N. Virginia

Traffic Generator

IP GRE Tunnels

Emulab, Utah

35

Delegation Scenario

• Cloud-Assisted Click Router– Packet Counters for

• Flows Forwarded to Cloud• Received Packets

– Prefix Based Miss Ratio– Modified Radix-Trie Cache for Forwarding Table – Router Controller

• Processing Cache Updates • Clock Based Cache Replacement Vector

36

• Random topology– Inter-domain and Intra-domain are random

• BRITE topology– BRITE model for inter-domain– Rocketfuel Topologies (ABILENE and GEANT) for

intra-domain• GTITM topology

– GTITM model for inter-domain– Rocketfuel Topologies (ABILENE and GEANT) for

intra-domain

Simulation Results

Forwarding Mechanisms

37

bTTL: How many copies of discovery packet will be made and forwarded? Provides caps on messaging cost.

dTTL: Time to Live, Hop-Count Limit

MAXFWD: Max. number of neighbors to be forwarded

Evaluation

• CAIDA, AS-level, Internet Topology as of January 2010 (33,508 ISPs)

• Trial with 10000 ISP Pair (src,dest), 101 times• With various ISP cooperation / participation

and packet filtering levels– NL: No local information used– L: Local information used (with various filtering)

• With no directional and policy improvements for base case (worst) performance

38

Results - Diversity

39

Tens of paths discovered favoring

multi-path routing and reliability schemes.

Results – Path Stretch

40

Results – Messaging Cost

41

Number of discovery packet copies is well

below theoretical bounds thanks to path-vector loop

prevention.

Results – Transmission Cost

42

43

Results - Reliability

44

Many Possibilities• Intra-cloud optimizations among routers

receiving the CAR service– Data Plane: Forwarding can be done in the cloud – Control Plane: Peering exchanges and routing

updates can be done in the cloud• Per-AS optimizations

– Data Plane: Pkts do not have to go back to the physical router until the egress point

– Control Plane: iBGP exchanges

45

Some Interesting Analogies?• High cloud-router delay

– CAR miss at the router Page Fault– Delegation is preferable

• Forward the pkt to the cloud proxy

• Low cloud-router delay– CAR miss at the router Cache Miss– Caching (i.e. immediate resolution) is preferable

• Buffer the pkt at the router and wait until the miss is resolved via the full router state at the cloud proxy

46

• Where is the bottleneck?

Intra-Node Bottlenecks

NYC

1Gb/s NIC0

NIC1

Disk0

Disk1

Internet

Miami

CPU0

CPU1

SFO

1Gb/s

100Mb/s

100Mb/s

50M

b/s

50Mb/s

file-to-NYC.dat

file-to-Miami.dat

47

• Where is the bottleneck?

Intra-Node Bottlenecks

NYC

1Gb/s

50 Mb/s

NIC0

NIC1

Disk0

Disk1

Network

Miami

CPU0

CPU1

SFO

1Gb/s

100Mb/s

100Mb/s

50M

b/s

50Mb/s

NYC

Miami

NIC 0

NIC 1

file-to-NYC.dat

file-to-Miami.dat

file-to-NYC.dat

file-to-Miami.dat

Inter-node Topology without Intra-node Visibility

100 Mb/s

100 Mb/s

75Mb/s

75Mb/s

50 Mb/sThe network’s routing algorithm finds the shortest paths to NYC and Miami with NIC 0 and NIC 1 being the exit points, respectively. However, the intra-node topology limits the effective transfer rates.

48

• Where is the bottleneck?

Intra-Node Bottlenecks

NYC

1Gb/s

75 Mb/s

NIC0

NIC1

Disk0

Disk1

Network

Miami

CPU0

CPU1

SFO

1Gb/s

100Mb/s

100Mb/s

50M

b/s

50Mb/s

NYC

Miami

NIC 0

NIC 1

file-to-NYC.dat

file-to-Miami.dat

file-to-NYC.dat

file-to-Miami.dat

Integrated Topology with Visible Intra-node Topology

100 Mb/s

100 Mb/s

75Mb/s

75Mb/s

75 Mb/s

Disk 0

Disk 1

100Mb/s

100Mb/s

50M

b/s

50Mb/s

When the intra-node topology is included in the calculation of shortest paths by the routing algorithm, it becomes possible to find better end-to-end combinations of flows for a higher aggregate rate.

49

CAR: An Architectural View

Routing As a Service(e.g., RCP)

Managing Routers from Cloud

(e.g., NEBULA, Cisco CSR)

Separation of Control & Forwarding Planes

(e.g., OpenFlow)

Parallel Architectures(e.g., RouteBricks)

Clustered Commodity Hardware

(e.g., Trellis, Google)

Specialized ASIC(e.g., Cisco)

Long Transition from Current State of Routing to Cloud-Integrated Next-Generation Future Internet

Cloud-Assisted

Routing (CAR)A middle-ground to realize the

architectural transition.

Technical Shortcomings

50

1

2

ISP A ISP B

AS-Path B-C-D: 45ms

AS-Path B-C-D: 35ms

Technical Shortcomings

51

A

B C

D

E

AS-Path B-C-D: 35ms

AS-Path B-E-D: 25ms

52

CAR: Caching and Delegation

Recommended