20
Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow- Based End-to-End QoS Paths Experience and Lessons Learned Dimitrios Katramatos, Dantong Yu, Kunal Shroff Brookhaven National Laboratory Thomas Robertazzi Stony Brook University Shawn McKee University of Michigan

Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

Embed Size (px)

Citation preview

Page 1: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

Center of ExcellenceWireless and Information Technology

CEWIT 2008

TeraPaths: Managing Flow-Based End-to-End QoS Paths

Experience and Lessons LearnedDimitrios Katramatos, Dantong Yu, Kunal Shroff

Brookhaven National LaboratoryThomas Robertazzi

Stony Brook UniversityShawn McKee

University of Michigan

Page 2: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

2

Abstract

• TeraPaths is a Department of Energy funded network research project to support efficient, predicable, and prioritized peta-scale data replication in modern high-speed networks

• The TeraPaths network management framework establishes on-demand and manages true end-to-end, QoS-aware, virtual network paths across multiple administrative network domains

• TeraPaths dedicates network resources to data flows specifically authorized to use such network paths, in a transparent and scalable manner. This ensures that only selected flows receive a pre-determined, guaranteed level of QoS in terms of bandwidth, jitter, delay, etc.

Page 3: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

3

Speaker’s Biography

• Dantong YuBrookhaven National Laboratory

• Dantong Yu received the Ph.D. degree in Computer Science from State University of New York at Buffalo, USA, in 2001. His research interests include high-speed network performance, network Quality of Service, cluster/grid computing, information retrieval, data mining, databases, and data warehouses. He leads the large volume WAN data transfer between CERN, BNL, ATLAS and RHIC collaboration institutes over high-speed networks with Grid middleware

Page 4: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

4

Outline

• Background: the TeraPaths project• Establishing flow-based end-to-end QoS

paths• Domain interoperation• Encountered issues and proposed

solutions• Project status and future work• Conclusions

Page 5: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

5

Background

• Provide QoS guarantees at the individual data flow level, all the way to the end hosts, transparently– Data flows have varying priority/importance

• Video streams• Critical data• Long duration transfers

– Default “best effort” network behavior treats all data flows as equal

– Capacity is not unlimited• Congestion causes bandwidth and latency variations• Performance and service disruption problems, unpredictability

• Dynamic flow-based SLAs = schedule network utilization– Regulate and classify (prioritize) traffic

Page 6: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

6

End-to-End Setup

siteborderrouter

virtualborderrouter

sitehost / border

router

regionalproviderrouter

regionalproviderrouter

siteborderrouter

hostrouter

hostrouter

WAN domains

host a2

Site Ahost a1

Site BSite C

host c1

host b1

ACLs:a1 b1a2 c1

ACLs:b1 a1

ACLs:c1 a2

VLAN X

10.100.1.y1

VLAN Y10.100.1.x1

10.100.1.y210.100.1.x2

Page 7: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

7

Establishing End-to-End QoS Paths

• Multiple administrative domains– Cooperation, trust, but each maintains full

control– Heterogeneous environment– Domain controller coordination through

web services

• Coordination models– Star

• Requires extensive information for all domains

– Daisy chain • Requires common flexible protocol across

all domains

– Hybrid (end-sites first)• Independent protocols• Direct end site negotiation

Page 8: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

8

Path Setup (2)

• End site subnets are configured by TeraPaths software instances (TeraPaths Domain Controllers or TDCs)– TDCs configure end site LANs to prioritize and regulate

authorized flows via the DiffServ framework at the network device level

– Source site polices/marks authorized flow packets

– Destination site admits/re-polices/re-marks packets

– End site LANs tx/rx marked packets to/from the WAN

• WAN provides MPLS tunnels or dynamic circuits– Initiating TDC requests MPLS tunnel or dynamic circuit with

matching bandwidth and lifetime, or…

– TDC groups flows with common src/dst into MPLS tunnel or dynamic circuit with aggregate bandwidth and lifetime

– WAN preserves packet markings

Page 9: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

9

Path Setup (3)

• WAN domains interoperate– Each end site’s TDC has a single point of contact for WAN

services

– TDCs have no knowledge of WAN internals other than what is exposed by the WAN services

• End sites have no direct control over the WAN

• Either tunnel or circuit through WAN– TeraPaths does not mix and match the layer 2 and layer 3

technology.

• TeraPaths “proxy” servers– Implement interface required by TeraPaths core

– Hide WAN service differences

– Clients to WAN web services (currently OSCARS / DRAGON)• Close cooperation with ESnet and I2 development teams

– Submit reservations for MPLS tunnels or dynamic circuits

– Handle security requirements

– Handle errors

Page 10: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

10

Addressing L2-Specific Issues

• Limitations with VLANs– Tag range (tentatively selected 50 VLANs – 3550 to 3599)

• Each site may have its own range

– Tag conflicts • Rely on WAN service• Eliminate by synchronizing site databases• VLAN renaming (if/when possible)

• Scalability issues– Limited number of VLAN tags/Circuits:

• Flow grouping / circuit consolidation– Forward flows through same virtual WAN circuit

» Create circuit with new parameters / switch current flows / cancel old circuit

» Modify WAN reservations (if/when possible)

– PBR overhead• Virtual border router

• Sensitive/3rd party network segments– VLAN pass-thru

Page 11: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

11

Flow Grouping/Circuit Consolidation

•Flows between same src and dst sites can share circuit, policing maintains bandwidth guarantee

•Multiple TeraPaths reservations associate with the same circuit reservation

– Easy when requirements are known in advance

– Modification of reservations required otherwise

• Selection/optimization to minimize resource waste

• Trade-off based on Δbw (bandwidth difference), Δtb, Δta (time period before and after a reservation)

2

13

4

5

2

13

4

5

time

band

wid

th

Δt

Δbw

current time

Page 12: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

12

Flow Grouping/Circuit Consolidation (2)

• Similar approach to disk buffering (read ahead / write behind)– Bring up ahead / teardown behind

– Reuse existing active circuits

– Reserve circuits with more bandwidth and longer duration depending on differences in start time, duration, bandwidth of reservations

– Delay teardown, modify circuit duration and/or bandwidth if possible

2

1 3

45

2

1 3

45

time

band

wid

th

current time

Δtb Δta

2

1 3

45

time

band

wid

th

current time

Δta

2

1 3

45

Page 13: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

13

Limitation of Dynamic Circuits

• A recent incident in BNL’s LHCOPN subnet: – Cisco’s PBR implementation only uses the status of

an interface to decide whether or not to forward packets

– A network circuit breaks somewhere along the path, but the involved interfaces on both ends are still up

– No probes and/or heartbeat exist to check the “health” of circuits

– Fail-over to the backup link does not work since primary interfaces are up even when such a problem exists

• End site monitoring is the most effective way to detect such a problem

Page 14: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

14

Active Circuit Probing

Each TeraPaths site instance periodically verifies “well being” of reservations:– Selects active reservations initiated by site (site

responsibility)

– Finds circuit/VLAN associated with each reservation

– Performs a circuit check with a quick pinging of other site’s router (private ip address space)

– Less than 100% success triggers a recheck with longer duration pings in both directions (to and from other site)

– Low success % triggers reservation cancellation reverting traffic to best effort network

– Optionally, the system adapts reservation data and attempts to setup a new end-to-end path (for given time period/number of attempts)

Page 15: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

15

Prioritizing Traffic

TeraPaths QoS test 1 (prioritize traffic)

0

200

400

600

800

1000

1200

0 200 400 600 800 1000

time (sec)

Ban

dwid

th (

Mbi

ts/s

ec)

priority

background

total

competing traffic

causes dramatic drop in

bandwidth

QoS / circuit reservation

active

Page 16: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

16

Recovering from Circuit Failure

TeraPaths QoS test 2 (prioritize/fallback to best effort)

0

200

400

600

800

1000

1200

0 200 400 600 800 1000

time (sec)

Ban

dwid

th (

Mbi

ts/s

ec)

priority

background

total

circuit interruption

recovery to best effort

Page 17: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

17

Competing against BE trafficremote EF against remote and local BE

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 100 200 300 400 500

remote EF

local BE

remote BE

remote EF against local BE

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 100 200 300 400 500 600

remote EF

local BE

local BE

Page 18: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

18

Status

• BNL, UMich, BU, all with 10Gbps connections, multiple pass-thru configurations (BNL, UMich, NoX, Merit, MiLR)

• Utilization of L3 paths (MPLS tunnels, ESnet only), L2 paths (dynamic circuits, ESnet and Internet2)

• Multiple QoS reservations through same circuit (support for circuit consolidation)

• Multiple circuits per site subject to per-site VLAN availability (flow grouping/circuit consolidation)

• Active circuit probing for failures with fallback to best effort network/attempt to reconfigure e2e path (in testing phase)

• Dynamic bandwidth allocation within service classes (in testing phase)

• New command line client

Page 19: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

19

Future Work

• Continue working on automatic flow grouping / circuit consolidation.

• Configurable reservation negotiation• Grid-style AAA (GUMS/VOMS)• Plug-ins: SRM (dCache), others• Compatibility with Lambda Station• Support for different hardware as needed• ATLAS Production:

– Replicate ATLAS Physics data from BU and UMich with the existing ATLAS DDM stack, and with end-to-end QoS circuits

– Tier 1 (BNL) and Tier 2 data replication

• http://www.terapaths.org

Page 20: Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned

CEWIT 2008

Center of ExcellenceWireless and Information Technology

20

Conclusions

• Demonstrated the effective prioritization and protection from interference of selected data transfers between three LHC experiment institutes – Brookhaven National Laboratory, the University of Michigan, and Boston University – through guaranteed bandwidth virtual paths, at the presence of intensive best-effort IP traffic sharing the same network resources

• A practical and economical end-to-end network resource reservation system, extending new capabilities to users/applications of end sites without requiring additional, expensive network infrastructure components