35
January 25, 2004 Joint Techs, Honolulu Hawaii Routing WG How I spent last summer: Converting MAX to 2547bis VPNs Converting NGIX/DC atm to gige Dan Magorian Director, Engineering and Operations [email protected]

January 25, 2004Joint Techs, Honolulu Hawaii Routing WG How I spent last summer: Converting MAX to 2547bis VPNs Converting NGIX/DC atm to gige Dan Magorian

Embed Size (px)

Citation preview

January 25, 2004 Joint Techs, Honolulu Hawaii Routing WG

How I spent last summer:Converting MAX to 2547bis VPNsConverting NGIX/DC atm to gige

Dan MagorianDirector, Engineering and Operations

[email protected]

2 Talks about 2 big projects

• Plus installing 2 new pops in Balt and DC/George Wash U: was busy summer.

• Technically the conversion of the NGIX/ DC NAP to gige was all L2, not L3 routing.

• But it did involve interesting use of Cisco RFC 1483 bridged routed atm encap conversion, to single-endedly convert each Fednet peer at the NAP to gige w/o a massive horrible flag day cutover. So I thought folks might be interested.

Talk 1: MAX’s 2547 vpn background

• Heard Ivan Gonzalez of Juniper’s presentation to routing wg July 2002 on Calren 2547 mockup. Examined his configs, looked useful & feasible.

• MAX in 2002 was offering standard mix of routes from Abilene, DREN, Esnet, vBNS, etc peerings. Everything working, nobody interested in 2547.

• But pressure from customers mounted to resell Qwest ISP service. Ran tests for 6 months with UMD as ISP and GU as customer, and found routing problems. Mocked up 2547 topology on MAX’s inexpensive lab routers, then deployed.

Mid-Atlantic Crossroads

Abilene

Network Virginia

Qwest ISP

National/InternationalPeering Networks

Regional NetworkParticipants

InstitutionalParticipants

ATDnet

VBNS

NGIX-DC

DREN

NISN

NRENESNet

MAXMAXRegional

Infrastructure

Supernet

National Oceanic and Atmospheric AdministrationNational Science FoundationNaval Research Lab & ATDnetSmithsonian InstitutionSoutheastern Universities Research AssociationUniversity Consortium for Advanced Internet DevelopmentUniversity of MarylandUniversity of Maryland, Baltimore CountyUniversity System of MarylandU.S. Geological Survey

The Catholic University of of AmericaGeorge Washington UniversityGeorgetown UniversityHoward Hughes Medical InstituteGallaudet UniversityNASA/Emerging Technologies CenterNASA/Goddard Space Flight CenterNational Consortium for Supercomputer Applications/ACCESSNational Institutes of HealthNational Library of Medicine

Univ Sys Maryland

MAX Core Optical Network

OADM

BALT

Qwest

OC48 lambda via MD state net

OC48c POS lambda via Luxn WDM

OC48c POS

Abilene

M160 Router

M160 Router

GigE Switch

M160 Router

M40e Router

ARLG

DCGW DCNE

CLPK

NGIX

6 nets

Abilene

Northern Virginia

Washington D.C.

College Park

Baltimore

So what are RFC 2547 L3 VPNs or, Why should you let MPLS onto your network?

• Probably everyone’s heard “fish problem” talks & knows about policy constrained routing issues.

• Don’t want to bore everyone with fish again, or characterizations of atm/frame vs L2 vs L3 vpns. Many people have done that better than I could

• This is from operational perspective: what caused MAX to convert to them, what are pros and cons, why regional (not just national) service providers might consider using them.

• Policy Constrained Routing, Explicit Routing Objectives

• Solve long standing “fish problem” by use of single router node to create multiple policies or “routing instances”

• Use more than destination as criteria for routing decision

• At minimum use Source (VPN membership or L3 Info) + Destination for route decision

• Technology evolution offers solution• RFC 2547bis and MPLS

Policy Constrained Routing Review

Juniper Networks, Inc. Copyright © 2002 – Proprietary and Confidential

D

E

A

B

C

F

G

H

ASN65001

ASN65002

ASN65003

ASN65004

ASN65005

ASN65006

EBGP

EBGP

EBGP

EBGP

EBGP

EBGP

IBGP

PCR/ER Overview

REQUIREMENTS:-Send traffic from router “D” across router “F”-Send traffic from router “E” across router “G”-Routers “D” and “E” are connected to “A”

Juniper Networks, Inc. Copyright © 2002 – Proprietary and Confidential

PCR/ER – The Challenge

inet.0

inet.0

inet.0

A

B

C

BGP Path Selection Process will run. ONE active path will be selected to get to the destination "yellow" network

Routes learned from EBGP Peers GRN and RED

Router “A” learns routes through IBGP

Both customers will use the single best path – Undesireable in this case

Juniper Networks, Inc. Copyright © 2002 – Proprietary and Confidential

PCR/ER – Solution – (Control Plane) 2547bis L3VPN

A

B

C

inet.0

grn.inet.0

inet.0

red.inet.0

cust1.inet.0

cust2.inet.0

Routes learned from EBGP Peers GRN and RED

Router “A” learns routes through IBGP

BGP Path Selection Process will run. An active path will be selected for each VRF to get to the destination "yellow" network

Each VRF still has access to all routes; these routes can be set with different preference values

Finer granularity can be achieved by using FBF + 2547bis/MPLS on customer facing ports

Juniper Networks, Inc. Copyright © 2002 – Proprietary and Confidential

That’s fine, but why should you care?

• 2547 is widely used by many national service providers to create overlay networks to different customers. Many run ip on edge, mpls in core.

• Should mention Cisco routers might have similar VRF capability. Don’t know, can’t speak to that.

• Yet gigapops don’t usually have overlay networks, and people usually find workarounds to fish problems.

• But in this case we ran into a show-stopper that caused us to really need to deploy them.

What Juniper doesn’t tell you about 2547

• They’re called Routing Instances (VRFs), but they AREN’T virtual routers. JunOs has virtual routers in 6.1

• 2547 vpns have only one iBGP. What happens is that a Route Identifier and Target bgp communities are added, putting routes into separate tables. IGP remains same.

• The catch is that interfaces need to be in one VRF or another. They can’t be in multiples. There are tricky workarounds with next-table policies, but they didn’t work well in our situation. Customers don’t know about 2547.

• So since we ebgp with everyone, we establish second peerings with ISP customers. A nuisance but it works.

4.1.0.0/16 (1 entry, 1 announced) *BGP Preference: 170/-81 Source: 138.18.47.37 Next hop: 138.18.47.37 via ge-3/2/1.175, selected State: <Active Ext> Local AS: 10886 Peer AS: 668 Age: 3d 19:11:46 Task: BGP_668.138.18.47.37+179 Announcement bits (6): 0-KRT 3-LDP 5-BGP.0.0.0.0+179 6-Resolve inet.0 7-Resolve inet.2 AS path: 668 1455 I Communities: 668:100 10886:5 Localpref: 80 Router ID: 138.18.9.55

3.0.0.0/8 (1 entry, 1 announced) *BGP Preference: 170/-81 Route Distinguisher: 206.196.177.247:1 Source: 206.196.177.247 Next hop: via so-2/0/0.0, selected Label operation: Push 100048 Protocol next hop: 206.196.177.247 Push 100048 Indirect next hop: eedd1f8 782 State: <Secondary Active Int Ext> Local AS: 10886 Peer AS: 10886 Age: 2d 3:04:00 Metric: 522 Metric2: 1 Task: BGP_10886.206.196.177.247+179 Announcement bits (2): 0-KRT 2-BGP.0.0.0.0+179 AS path: 209 7018 80 I Communities: 209:888 209:889 10886:8 target:209:1 VPN Label: 100048 Localpref: 80 Router ID: 206.196.177.247 Primary Routing Table bgp.l3vpn.0

So what was the big problem?

• We mark routes from upstreams with bgp communities, and use them to subset which routes are advertised to downstream customers.

• Eg, everyone gets Abilene, Dren, Esnet, MAX, but only a few get vBNS (yes, we still have)

• Started out doing same w/ ISP : bgp community to control Qwest route advertisement.

• Then discovered that we were blackholing traffic from certain non-Qwest customers eg NLM.

Exploring further

• We found that the main problem was gigapops who advertise unequal prefix length announcements to their ISP and Abilene. If everyone’s were equal would be fine.

• But we discovered that it’s unfortunately fairly common for GPs to aggregate towards Abilene and not aggregate or aggregate less towards their ISPs, very undesirable

• So when NLM saw a route and sent traffic to MAX, a more specific Qwest route not advertised to them could take precedence, and yet they hadn’t subscribed to Qwest ISP . So that traffic would be dropped.

• After some attempts to get sources to fix it we gave up.

So we realized that

• Most gigapops have only one service offering, and give everyone a mix of I2 and ISP routes.

• The problem would only get worse once we had customers not subscribed to Abilene (now do).

• There wasn’t an easy way out that could “get back” routes not preferred by bgp when sub-setted by bgp communities for advertisement.

• We had seen same issue in minor way with vBNS, but NGIX Abilene/vBNS peering solved it

• Lots of folks run 2547, and Juniper supports well

OK, so how hard was it to deploy?

• Went quite smoothly. We had Ivan’s configs that had worked with Calren mockup. Ben Eater of Juniper gave lots of help, but we mocked up ourselves in our lab.

• Main issue was what to do about the dual peering issue. Since we bgp with everyone except 2 downtown offices, decided to simplify to make everyone same. One went away and Ucaid/dc converting soon. Lots would be prob

• Basically, we turned on new VRF, ran for awhile, no issues, then turned on Qwest peering & new custs.

• Did cheat and keep existing inet.0: Juniper recommends doing everything as VRFs. Better but nasty cutover.

O110.100.1.1/32

O310.100.1.3/32

O210.100.1.2/32

10.0.2/30

.2

.1

10.0.1/30

10.0.3/30

.2.1

.1

.2

fxp0 fxp0

O4O5

fxp0 fxp0

MAXAS10

&OSPF

Aggregate10.0.0.0/8

10.10.3/30ISP traffic

.1

.2

.1

.2

fxp1.0

fxp1

fxp1.1

fxp1

AS200PARTICIPANT

AS300QWEST

2547 TEST

O6

10.10.0/30

.1

.2

AS400ABILENE

10.10.2/30I2 Traffic

10.10.1/30

fxp1.1

fxp2.0

fxp1

AGGREGATE192.168.100.0/24

192.200.0.0/16

AGGREGATE172.100.0.0/16

AGGREGATE15.0.0.0//8

Fxp1.0 .1

10.10.4/30All Traffic

Lab-7505

.2

I1 + I2PARTICIPANT

static to192.168.200.0/24

There’ve got to be some downsides

• Two virtual circuit requirement is biggest. We couldn’t have done it if we had lots of statics to small customers. Ethernet easy, yet carriers like Yipes don’t support trunking and multiple vlans. Sonet & DSXs need frame relay encap for dlcis.

• Also have get used to tracing & pinging within other routing tables, & oddities about source ints

• And what MPLS is doing can be fuzzy with core and edge routers same, like nat net in 4 boxes.

• Plus your bgp becomes unusual, eg Arbor mons

Sample JunOS showing VRF (partial)

• routing-instances {• Q {• instance-type vrf;• /* UMD-ISP */• interface ge-2/2/0.1;• /* GU-ISP */• interface at-1/0/0.4;• /* HHMI-ISP */• interface ge-2/2/1.6;• route-distinguisher 206.196.177.246:1;• vrf-import QWEST-IMPORT;• vrf-export QWEST-EXPORT;• routing-options {• rib Q.inet.0 {• aggregate {• route 206.196.176.0/21 {• community 10886:1; passive

• protocols {• bgp {• group ISP {• type external;• export [ AGGREGATE-ROUTES NO-MAX-SPECIFICS ];• neighbor 206.196.177.50 {• description UMD-ISP;• import from-UMD;• export [ DEFAULT MAX QWEST REJECT ];• peer-as 27;• neighbor 206.196.177.154 {• description HHMI-ISP;• import from-HHMI;• export [ NO-DEFAULT QWEST REJECT ];• peer-as 16692community QWEST members 10886:8;

• policy-statement QWEST-IMPORT {• term 10 {• from community QWEST-VRF then accept;• term 20 then reject• policy-statement QWEST-EXPORT {• term 10 {• then community add QWEST-VRF; accept;

• community ABILENE members 10886:3;• community DREN members 10886:5;• community ESNET members 10886:6;• community MAX members 10886:1;• community NISN members 10886:4;• community NREN members 10886:9;• community QWEST members 10886:8;• community QWEST-CUST members 209:209;• community QWEST-VRF members target:209:1;• community VBNS members 10886:2;

So do we recommend it?

• Sure. Wasn’t that hard, no real war stories, didn’t cost anything except time to scope it out.

• Solves problem nicely, caused by our being late into ISP resale, and reselling each separately.

• Can have as many overlay nets as we want for different service offerings (mostly different ISPs)

• Might also be handy if don’t want to mix lost-cost ISP routes (eg Cogent) w/higher cost (eg Qwest)

• Might be trickier for GPs with lots of small custs

Thanks!

Questions?

Talk 2: Converting NGIX/DC to gige

• MAX has run the Fednet peer point for Abilene, vBNS, DREN, NISN, NREN, USGS for 4 years

• Previously called NGIX/East coast, is meet-me point (NAP) for major Federal R&E nets.

• Topology was full L2 mesh of p-t-p atm pvcs. No transit peerings, no common route pool. Doesn’t scale to larger peer points, but kept life simple.

• Then atm began to go out of style, and customer nets began push for frame-based (gige) peering

Took us awhile to get it done

• We contemplated several architectures, including putting all nets on Junipers directly and using transitional cross-connects (tccs) that allow bolting ethernets to atms to sonets.

• Ended up deciding to use conventional switch, because of high cost of 10G router interfaces ($110k vs $30k for switched), also no tcc IPv6.

• Did side-by-side comparison of Cisco 6500 vs Extreme Black Diamond. Liked Extremes, but no oc12 atm, and winning feature was Cisco’s OSM-2OC12-atm 1483 bridged-routing blade

RFC 1483:Multiprotocol encap over AAL5

• One of the many reasons in addition to lane why people run screaming from atm (actually, if you only do pvcs, atm is easy and reliable AND can do 9K and even 64k mtus). I’m no 1483 expert.

• There are many variations involving routed and bridged pdus. Basically, bridged is used by SPs who need to connect ethernets via an atm cloud. What Marconi’s gige int does: useless for us. Juniper does minimal routed subset w/1 ip addr good only for dslams. Others as well, complex.

Bridged-routed connect: the magic

• There are several variants of routed pdus across different platforms, mostly also useless to us.

• But on 6500 OSM blades Cisco has developed a 1483 bridged/routed encap that does just what we wanted: allow ethernets to connect to atms. Uses proxy arp and other ugly stuff internally.

• Best part was that we only needed it for the transition, so we borrowed the $60K blade for 6 months & then returned it when all atm gone.

So why are we discussing 1483 anyway?

• Converting a national NAP from atm to gige isn’t like changing over your campus net at midnight. Replacement lines have to be arranged, peer partners have to coordinate ints, downtime has to be minimal because they’re paying for uptime

• The biggest issue was how to do the transition without massive flag-day changeover nightmare

• With testing, realized that the Cisco OSM blade bridged-routing would allow us to single-endedly change over each peer at midnight without any of their full-mesh of peers needing to be there.

How did we do it?

We hooked up both OSM OC12 atms to the Marconi atm switch (would have been a problem if more traffic than would fit, but there wasn’t).

• All peers were assigned new vlans matched to their old pvcs. The 6500 bridged routed encap allowed us to make ip connections across. In some sense was subset of TCCs, but cheaper.

• When each peer had their new line and gige int ready with new vlans, we hot-cut their line both ends to gige, moving the /30 w/o peers knowing!

Sample bre-connect 6500 IOS (partial)

interface ATM4/1• description marconi 1D1• mtu 9186• no ip address• atm bridge-enable• switchport trunk allowed vlan 161,170,178• switchport mode trunkinterface ATM4/1.161 point-to-point• pvc USGS-Abilene 161/32 • bre-connect 161interface GigabitEthernet3/5• description USGS• mtu 9216• no ip address• switchport trunk encapsulation dot1q• switchport trunk allowed vlan 158,161,162,178• switchport mode trunkinterface Vlan161• description ABIL-USGS ACTIVE gige• no ip address• mtu 9216

How did it go? Had to be some war story

• Of course there was, • Overall had to be VERY careful at midnight cutting pvcs• Testing initially didn’t work, took conf with Cisco product

manager to resolve issues. Then fine. Had incredible lab juryrig of atm switch & routers in multiple bldgs

• NASA’s NISN was first, easy because had spare fiber to GSFC, didn’t have to hotcut.

• Then Abilene was easy, because MAX sells them lambda transport to NGIX, just had change wdm

• USGS, NLM, and MAX also easy hotcuts.• But MCI’s DREN and vBNS were nightmare.

The war story

• Caveat: MCI did VERY good job, not their fault.• MCI is contractor for both original vBNS and in 2002

brought up new DREN. MCI had Verizon OC12 from their pop to MAX for years, I suggested both share it.

• With summer gige conversion, were unable to get replacement gige line to their pop. Decided keep OC12atm and I installed their colo M10 to run TCCs.

• Juniper and Andrea Reitzel tested in lab, worked• At midnight cut, vBNS TCCs failed, had to back out 5am.

Eventually had to loan MCI MAX on-site test routers.• Scott Robohn of Juniper found,needed old gige firmware• After another 4am-er got working, but tccs still touchy

So now that’s done…

• All peers off atm, the OSM blade went back

• Now the config is totally boring, pure L2 on one 16-port gige blade, runs beautifully.

• Will be upgrading Abilene’s peering to 10G next week, already tested Luxn 10G transponders.

• NREN joined NGIX, using GSFC wdm transport

• Subsequently bought lab 6503 to test future IOSes and other changes. Upcoming release will have L2 support for netflow, now just int stats.

Thanks!Questions?

Dan Magorian

[email protected]