6
© Big Switch Networks 1 Case Study: Industry’s Largest NFV Deployment Collaboration between Red Hat, Dell and Big Switch for a tier-1 US Service Provider embracing large scale NFV deployments on OpenStack with SDN NFV deployments represent some of the most demanding workloads in OpenStack clouds, yet the economic and operational promise of NFV makes this a high-value technical challenge for service providers worldwide. This paper will discuss the collaboration between Dell, Red Hat and Big Switch for a tier-1 US service provider in the industry’s largest deployment of NFV infrastructure to date. Four key areas of collaboration were needed to bring the deployment from lab to production: § Resiliency & Performance at Scale § Design & Deployment Flexibility § Reducing Operational Complexity § Integrating Security & Analytics Software developers from Red Hat and Big Switch worked together on a daily basis over months, leveraging over $1m of test hardware from Dell, to accelerate the open community engineering process and deliver a high quality, validated NFV Pod architecture. As a result of the collaboration, multiple improvements were made to upstream open source code to align the final Pod design with key design and operational considerations of a large service provider network infrastructure. Big Switch SDN Controllers (Physical appliance pair) Switch Light OS on Spine (40G Dell ON switches) Red Hat OpenStack 7.1 (with Neutron) Red Hat Enterprise Linux with Switch Light VX (on Dell R630 Compute Nodes) Switch Light OS on Leaf (10G/40G Dell ON switches) + + This SDN/NFV collaboration highlights the open source leadership of Red Hat, the SDN expertise of Big Switch and the proven service and support at scale from Dell Figure 1: Pod Design At A Glance

Case Study: Industry’s Largest NFV Deployment...Case Study: Industry’s Largest NFV Deployment Collaboration between Red Hat, Dell and Big Switch for a tier-1 US Service Provider

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

©BigSwitchNetworks1

CaseStudy:Industry’sLargestNFVDeploymentCollaborationbetweenRedHat,DellandBigSwitchforatier-1USServiceProviderembracinglargescaleNFVdeploymentsonOpenStackwithSDNNFVdeploymentsrepresentsomeofthemostdemandingworkloadsinOpenStackclouds,yettheeconomicandoperationalpromiseofNFVmakesthisahigh-valuetechnicalchallengeforserviceprovidersworldwide.ThispaperwilldiscussthecollaborationbetweenDell,RedHatandBigSwitchforatier-1USserviceproviderintheindustry’slargestdeploymentofNFVinfrastructuretodate.Fourkeyareasofcollaborationwereneededtobringthedeploymentfromlabtoproduction:

§ Resiliency&PerformanceatScale § Design&DeploymentFlexibility

§ ReducingOperationalComplexity § IntegratingSecurity&Analytics

SoftwaredevelopersfromRedHatandBigSwitchworkedtogetheronadailybasisovermonths,leveragingover$1moftesthardwarefromDell,toacceleratetheopencommunityengineeringprocessanddeliverahighquality,validatedNFVPodarchitecture.Asaresultofthecollaboration,multipleimprovementsweremadetoupstreamopensourcecodetoalignthefinalPoddesignwithkeydesignandoperationalconsiderationsofalargeserviceprovidernetworkinfrastructure.

Big Switch SDN Controllers

(Physical appliance pair)

Switch Light OS on Spine

(40G Dell ON switches)

Red Hat OpenStack 7.1

(with Neutron)

Red Hat Enterprise Linux with Switch Light VX

(on Dell R630 Compute Nodes)

Switch Light OS on Leaf

(10G/40G Dell ON switches)

+

+

ThisSDN/NFVcollaborationhighlightstheopensourceleadershipofRedHat,theSDNexpertiseofBigSwitchandtheprovenserviceand

supportatscalefromDell

Figure1:PodDesignAtAGlance

CaseStudy:Industry’sLargestNFVDeployment

©BigSwitchNetworks2

KeyNetworkDesignChallengesThekeynetworkchallengesfortheOpenStackNFVpoddeploymentfellinfivemajorcategories:

• ResiliencyAtScale:Toachievescale,thedesignfollowedahyperscale-inspired“coreandpod”approach1,witha12rackpoddesignreplicatedacrossanumberofdatacentersacrosstheUS.The12rackpod,amulti-milliondollarinvestment,wasreplicatedatbothDellandBigSwitchlabstotestthesystemunderstress.Resiliencywasrequiredateverylevel–inthevSwitch,theleaf,thespine,thenetworkservicesandtheingress/egresstothedatacentercoreandotherpods.

• NoBandwidthBottlenecks:NFVworkloadsputextremestressonthenetworkinmanydimensions–east/westbandwidth,north/southbandwidth,intra-vSwitchbandwidthandlogicalL2/L3bandwidth.Neitherbandwidthlimitationsfromlegacyprotocolslikespanningtreenorpackethair-pinningacrossthefabricforoverlaygatewaypurposeswereacceptable,yetVNFinstancesneededtobeprovisionedinanyrackatanytime.ThesystemasawholerequiredoptimizedbandwidthcharacteristicsfromvSwitchtoleaftospineinbothnormalrunningoperationsandinpartialfailurescenarios.

• LogicalNetworkDesignFlexibility:ThepoddesignneededtoaccommodateNFVworkloadsthateachhaduniquelogicalnetworkrequirements,yetneededtosharethesamephysicalleaf/spinefabricandvSwitches.Ratherthanaone-size-fits-allL2/L3approach,thisdesignneededtoaccommodateNFV-specificpublicL2networks,publicL3networks,privateL2networks,tenant-managedservicechainswithFWaaSandLBaaS,provider-managedservicechainstransparenttothetenants,virtualtenantnetworkfunctions,physicalprovidernetworkfunctionswithcapacityforhighbandwidthbroadcastandarangeofconnectivityoptionstonumerousexternalnetworks.Alloftheseoptionsneededtobemixed-and-matchedinpeacefulco-existenceinthesamephysicalpodatthesametime,withrelevantprovisioningworkflowsautomatedbyOpenStack.

• ReducedOperationalComplexity:OperationalcomplexityfortheNFVdeploymentforthisengagementcameintwoforms:a)lifecyclemanagementofthenetworkcontrolsystemsrelativetotheOpenStackcontrolsystems,andb)trainingfordesign/install/troubleshootingofthenetworkcontrolsystemitself.ThefirstrequiredtightintegrationbetweenBigSwitchandRedHat.Theendresult–anleaf-spineCLOSfabricthatcanbeupgradedinlesstimethananiPhonewithoutimpactingproductionworkloadsorOpenStackcontrolsystems–isuniqueintheindustry.ThesecondleveragedBigSwitch’s“OneBigSwitch”metaphor,detailedbelow.

• IntegratedSecurity&Visibility:ToensurethattheNFVPodiscompliantandsecureagainstintrusionsandotherthreats,itwasimportanttodesignanout-of-bandmonitoringcapabilityforE-WtrafficaswellasaninlineprotectionmechanismforN-StrafficasapartoftheoverallPoddesign.Keyrequirementsfromthisvisibilityinfrastructurewere:ascale-outdesignthatgrewwiththePodscale;supportformulti-tenant/multi-toolenvironmentsand,easeofdeploymentandoperation.

PodDesignAtAGlanceThegeneralpoddesignincludesoneservices/connectivity/controlrackand12computeracks(Figure1).

§ Services/connectivity/controlrackholdstheSDNcontrollers,OpenStackcontrollers,variousphysicalprovider-sidenetworkservicesandtheingress/egressgatewaystonetworksconnectingtothepod.Whilethisrackrepresentsonly10%ofthephysicalspace,itrepresents90%oftheengineeringeffortinvolvedinthedesign.

§ Computeracksareintendedasascale-outdesign,with12perpodintheinitialdeployment.Thiswasdesignedtoevolveovertimeasmorecapacityperlocationisrequired,andsomelocationshavepower/coolingconstraintsandrequireflexibilityinserverdensity.ThenetworkingforeachcomputerackfeaturestheBigSwitchSwitchLightOSrunningateachtopofrack,runningonDellONswitchhardware.ThefirstgenerationpoddesignusedOpenVSwitch,whilethesecondgenerationusesBigSwitchSwitchLightVX(a“P+V”FabricDesign)runningonDellcomputenodes.

1Seethisarticleco-authoredbyPetrLapukhov,ArchitectatFacebook,andKyleForster,FounderofBigSwitch:http://www.infoworld.com/article/2608992/data-center/data-center-rethinking-the-data-center-network.html

CaseStudy:Industry’sLargestNFVDeployment

©BigSwitchNetworks3

Fornetworkvisibilityandmonitoring,SPANportsfromeachtopofrackswitchwereintendedtointegratewithBigSwitch’sBigMonitoringFabric.Thisenabledon-demandandgranularE-Wtrafficmonitoring(includingintra-hosttrafficusingRSPAN).Inphase1,DDoSmitigationtoolswereconnectedinlinetoprotectallN-StrafficandmanagedfromtheBigMonitoringFabriccontroller.

ResiliencyatScaleTovalidatetheresiliencyoftheNFVpoddesignatscale,largescaletestbeds(>$1.5meach)wereconstructedinbothDellandBigSwitchfacilities.Thecross-vendorteamuseda“ChaosMonkey”methodologypioneeredbyNetflix,culminatinginatestwith640forcednetworkfailuresinunder30minuteswithnoimpacttoworkloadperformance.2

Ina‘chaosmonkey’styletest,randomnetworkfailureswereinjectedintothepodwhilerunning‘worstcase’workloads,includingtheHadoopTerrasortbenchmark.Withinthetestingwindow,BigCloudFabricSDNcontrollerswereforcedtofail-overevery30seconds,arandomswitchwasforcedtofailevery8secondsandarandomlinkwasforcedtofailevery4seconds.

NoBandwidthBottlenecksNFVworkloadsputextremestressonthenetworkinmanydimensions–east/westbandwidth,north/southbandwidth,intra-vSwitchbandwidthandlogicalL2/L3bandwidth.Aleaf-spineCLOSdesign,popularizedbyGoogle3,hasbecomethecommonapproachforextremeeast/west/north/southbandwidthrequirements.However,thetraditionalalphabetsoupofprotocolsusedtoreplicatetheGoogledesignwithlegacynetworkingproductsoftenleavesdatacenterdesignsthatareextremelyfragileinthefaceofpartialfailures,particularlyatthehost,orthatsignificantlyconstrainworkloadplacement.ForVNFdeployments,thesedownsidesmaketheseapproachesanon-starter.Amodernleaf-spineCLOSdesign,usingcentralizedSDNcontroldesignedtoseethenetworkfromspinetoleaftovSwitch,wastheoptimalanswerforthisdesign.

Figure3:Leaf-SpineClosFabricArchitecture

2FormoredetailsonBigSwitch’sChaosMonkeytestingforOpenStacknetworking,seehttp://go.bigswitch.com/rs/bigswitchnetworks/images/Chaos%20Monkey%20and%20Big%20Cloud%20Fabric.pdf3Forahistoryofleaf-spineCLOSdesignsatGoogle,seehttp://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.pdf

Leaf-spine CLOS extended all the way down to the vSwitch

!  Maximized bandwidth use across all active links

!  Designed-in coverage of all partial failure cases from vSwitch to leaf to spine to controllers to OpenStack orchestration (compared to ‘alphabet soup’ of protocols)

!  Fully distributed L3 and Floating IP functions (no packet hair-pins)

!  End-to-end analytics and troubleshooting tools from vSwitch to leaf to spine

A B

vSWITCH

vSWITCH

vSWITCH

vSWITCH

A B

vSWITCH

vSWITCH

vSWITCH

vSWITCH

A B A B

SCALE OUT INGRESS EGRESS

BARE METAL SERVERS & STORAGE

VIRTUAL MACHINE RACKS SERVICES &

CONNECTIVITY RACKS

BIG CLOUD FABRIC SDN CONTROLLERS

Centralized Control Plane

Figure2:DataCenterScaleTestSetup

CaseStudy:Industry’sLargestNFVDeployment

©BigSwitchNetworks4

LogicalNetworkFlexibilityThepoddesignneededtoaccommodateNFVworkloadsthateachhaduniquelogicalnetworkrequirements,yetneededtosharethesamephysicalleaf/spinefabricandvSwitches.Ratherthanaone-size-fits-allL2/L3approach,thisdesignneededtoaccommodatenumerousNFV-specificL2/L3/servicedesigns.Theseincluded:

• PublicL2networkswithworkload-specificroutersforingress/egress• Public(routable)L3networksconnectedviaBGPandstaticroutestothevariousserviceprovidernetworks• PrivateL2networksforworkloadsrequiringinter-VNFbroadcastandL2multicastconnectivity• Tenant-managedservicechainswithFWaaS,LBaaSandotherservicesmanagedbyworkload-specificteamsontheir

operationalschedules• Provider-managedservicechains,transparenttothetenants,toserveascorporatestandardsacrossawidevariety(butnot

all)NFVworkloadsloadedontothepod• Amixofbothvirtualnetworkfunctionsandphysicalnetworkfunctionsinsertedintotheservicechainsmentionedaboveto

serviceNFVworkloads,• Amixofbothvirtualnetworkfunctionsandpart-virtual/part-physicalnetworkfunctionsmakingupaNFVworkload(i.e.

specializedphysicalequipmentandhighratestorage)

Whereapplicable,workflowsrequiredforprovisioningthesenetworksneededtobeorchestratedthroughOpenStackAPIsandUserInterfaces.

ReducedOperationalComplexityNFVdesignsinthelabcanbeincrediblycomplex,representingunboundedoperationalrisk.Toaddressthoserisks,easeofdeploymentandmanagementofday-to-dayoperationswerecriticalelementsforthisdesign.

§ OpenStackDeployment:Thiswasaddressedwithapowerful,simplifiedandautomatedcloudinstallationtoolfromRedHat-theRHELOSP7director,whichalsoprovidessystem-widehealthcheckingandcompletelifecyclemanagement.TheintegrationoftheBCFnetworkinginstallerwithRHELOSP7directorprovidesacompletelyintegratedworkflowthatnotonlymakesthesysteminstallationprocessseamlessandpredictable,butalsoensuresthestabilityandrapidconvergenceofthesystemuponsubsequentupgradeofthesystemcomponents.

§ PodOperations:Inordertomakethissystemintuitivefornetworkingprofessionals,thepoddesignusedBigCloudFabric’s“OneBigSwitch”operationalmetaphor(Figure5).Fromanoperationsperspective,theSDNcontrollersfeel/actjustlikechassissupervisors,whilethespineswitchesfeeljustlikeachassisbackplaneandtheleafandvSwitchesfeeljustlikechassislinecards.Thismetaphordramaticallyreducedthetrainingrequiredwhenintegratingthisnewpodintoexistingoperationalprocesses.

Figure4:RHELOpenStackPlatformDirector

CaseStudy:Industry’sLargestNFVDeployment

©BigSwitchNetworks5

Figure5:One"BigSwitch"

WithcomplexNFVworkloadsridingontopofalayerofOpenStackautomationwhichitselfisridingontopofanSDNfabric,networkhealth,historyandtroubleshootingtoolswereakeychallengeforthedeployment.WithintegrationfromvSwitchtoleaftospine,thevisibilityoftheBigCloudFabric“P+V”designdramaticallyreducedoperationalconcernswiththiskindofdeployment.AccordingtoarecentACGresearchstudy,thesetoolsallowfortroubleshooting12xfasterthantraditionalnetworkdesignsforthesetypesofpods4.

IntegratedSecurity&VisibilityToensurethattheNFVPodiscompliantandsecureagainstintrusionsandotherthreats,BigMonitoringFabricwasusedtomonitorEast-Westtraffic(intra-pod)andNorth-Southtraffic(inline).BigMonitoringFabricisprovisionedandmanagedthroughacentralized,singlepaneofglass—BigMonitoringFabriccontrollerCLI,GUIorRESTAPIs.Inadditiontodeliveringrelevanttraffictodedicatedtools(e.g.DDoSapplianceininlinedeployment),BigMonitoringFabricalsosupportsbuiltinanalyticsandtroubleshootingasshowninFigure6.

4TheentireACGstudy,showing12xfastertroubleshootingtimes,20xfastersoftwareupgradetimesand12xfasterpodexpansiontimesisavailableathttp://go.bigswitch.com/rs/974-WXR-561/images/Economic%20Advantages%20of%20Open%20SDN%20Fabrics%20-%20ACG%20Research.pdf

Traditional Chassis Pair

BACKPLANE

SUPERVISOR(S)

LINE CARD(S) LINE CARD

LINE CARD

LINE CARD

LINE CARD

LINE CARD

SUPERVISOR 1

LINE CARD

LINE CARD

LINE CARD

LINE CARD

LINE CARD

SUPERVISOR

BIG CLOUD FABRIC

CONTROLLER

1 3

SPINE SWITCHES

2 4 1 3 2 4

COMPUTE WORKLOAD

SERVICES & CONNECTIVITY

COMPUTE WORKLOAD

LEAF SWITCHES LINE CARD

LINE CARD

LINE CARD

LINE CARD

LINE CARD

SUPERVISOR

LINE CARD

LINE CARD

LINE CARD

LINE CARD

LINE CARD

SUPERVISOR

BAC

KPLA

NE

BAC

KPLA

NE

Health

Machine-assisted troubleshooting

History

CaseStudy:Industry’sLargestNFVDeployment

©BigSwitchNetworks6

Figure6:IntegratedVisibility&Analytics

ToLearnMore§ BigCloudFabricOverview:Moredetailsavailableat:http://bigswitch.com/sdn-products/big-cloud-fabric

§ RedHatOpenStackPlatformOverview:Moredetailsavailableathttps://access.redhat.com/products/red-hat-openstack-platform

§ BigMonitoringFabricOverview:Moredetailsavailableat:http://bigswitch.com/products/big-monitoring-fabric

§ BigSwitchLabs:Gethands-onexperiencewiththeseamlessintegrationofOpenStackandBigCloudFabric(P+VEdition)usingBigSwitch’sNeutronplugin.Availableonline,forfree:http://labs.bigswitch.com

§ BCFStarterKits:BigSwitchoffersthisfullytested,scalableOpenStacknetworkingsolutioninseveralBigCloudFabricstarterkits,pre-configuredwithhardware,cables,supportandphysical+virtualBigCloudFabricsoftwarestartingat$49k.Formoredetails,downloadthebrochureat:http://bigswitch.com/starter-kits

§ TestSetupDetails:Detailsofthescaletestingarchitectureandchaosmonkeytestinginstallationandmethodologyavailableonrequest.Emailinfo@bigswitch.com.

ABOUTBIGSWITCH

BigSwitchNetworksisthemarket leaderinbringinghyperscaledatacenternetworkingtechnologiestoamainstreamdatacenteraudience.Thecompany is taking threekeyhyperscale technologies --OEM/ODMbaremetalandopenEthernet switchhardware,sophisticated SDN control software, and core-and-pod data center designs -- and leveraging them in fit-for-purpose productsdesignedforuseinenterprises,cloudproviders,andserviceproviders.Foradditionalinformation,[email protected],follow@bigswitch,orvisitwww.bigswitch.com.

Big SwitchNetworks, Big Cloud Fabric, BigMonitoring Fabric, Switch LightOS, and Switch Light VX are trademarks or registeredtrademarksofBigSwitchNetworks, Inc.Allothertrademarks,servicemarks,registeredmarks,orregisteredservicemarksarethepropertyoftheirrespectiveowners.