Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
P802.1Qczinterworkingwithotherdatacentertechnologies
JesusEscudero-Sahuquillo,PedroJavierGarcia,FranciscoJ.Quiles,JoseDuato
IEEE802.1PlenaryMeeting,SanDiego,CA,USAJuly8,2018
Executivesummary
Existingcongestioncontrolmechanismsworkend-to-end.Weneedcomplementarymechanismsthatreactquicklywhentransientcongestionappears,alsopreventingHoL blockingfromdegradingperformance.
Outline
• Whycongestionisolationisneeded?• Analysisofcongestionscenarios• Limitationsofcurrenttechnologies• CongestionIsolationinDCNs• Conclusions
Outline
• Whycongestionisolationisneeded?• Analysisofcongestionscenarios• Limitationsofcurrenttechnologies• CongestionIsolationinDCNs• Conclusions
• TodaysDatacenters(DCNs)requireaflexiblefabricforcarryinginaconvergentwaytrafficfromdifferenttypesofapplications,storageofcontrol.• FabricdesignforDCNsmustminimizeoreliminatepacketloss,providehighthroughputandmaintainlowlatency.• Thesegoals arecrucialforapplicationsofOLDI,DeepLearning,NVMe overFabricsandtheCloudified CentralOffices.• However,congestion threatensthesegoals.
Whycongestionisolationisneeded?
Paul Congdon et al: The Lossless Network for Data Centers. NENDICA “Network Enhancements for the Next Decade” Industry Connections Activity, IEEE Standards Association, 2018.
• HoL-blockingdramaticallydegradesthenetworkperformance(e.g.PFChasnotenoughgranularityandthereisnocongestedflowidentification).• Classicale2econgestioncontrolforlosslessnetworksisdifficulttotune,reactsslowly,andmayintroduceoscillationsandinstability[1].
HSstarts
HSends
HS = traffic injected to Hot Spot destination
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1e+06 2e+06 3e+06 4e+06 5e+06Network Throughput (normalized)
Time (nanoseconds)
1QITh
VOQnet
64-node CLOS network, 4 hot-spots
Whycongestionisolationisneeded?
[1] Jesús Escudero-Sahuquillo, Ernst Gunnar Gran, Pedro Javier García, Jose Flich, Tor Skeie, OlavLysne, Francisco J. Quiles, José Duato: Combining Congested-Flow Isolation and Injection Throttlingin HPC Interconnection Networks. ICPP 2011: 662-672
• Weneedacongestionisolation(CI)mechanismthatreactsquicklywhentransientcongestionsituationsappear,preventingnetworkperformancedegradationcausedbytheHoL blocking.• WewantaCImechanismthatcomplementsothertechnologies availableintheDCNs,sothatCIimprovestheirperformance,whiletheothersreducetheCIcomplexity.
Whycongestionisolationisneeded?
Outline
• Whycongestionisolationisneeded?• Analysisofcongestionscenarios• Limitationsofcurrenttechnologies• CongestionIsolationinDCNs• Conclusions
AnalysisofcongestionTheOriginofCongestion
• Somepacketssimultaneouslyrequestthesameoutput portwithinaswitch.• Apacketcanbeforwardedwhiletheother(s)wait(s),sincetransferencespeedisdeterminedbytheoutputlink.
Contention
• Persistentcontention duringtime.• Bufferscontainingblockedpacketsfillup atingressandegressport,andcongestionappears.
Congestion appears
AnalysisofcongestionTheOriginofCongestion
CongestionRoot
PFC
AnalysisofcongestionTheOriginofCongestion
• Inlosslessnetworks,congestionpropagatesquickly duetobuffersbackpressure.Congestiontrees growupinthisway.
• Differentcongestiontreesdynamicsmakesmorecomplexthecongestionmanagement[Garciaetal.05].
Root of Congestion
Branch
Branch
Leaf
Leaf
Congestion can reach the sources
AnalysisofcongestionTheOriginofCongestion
Pedro J. García, J. Flich, J. Duato, I. Johnson, F. J. Quiles, F. Naven: Dynamic Evolution of Congestion Trees: Analysis and Impact on Switch Architecture. HiPEAC 2005: 266-285
PFC
AnalysisofcongestionCongestiontreesdynamics
• Ingeneral,theswitchwherecongestionoriginatescouldbelocatedatsomeinitialorintermediatestageorbedirectlyconnectedtoendnodes.
Root of Congestion
Incast congestion
Root of Congestion
In-networkcongestion
• Itusuallyoccurswhencongestionislight(i.e.itexceedsavailablelinkbandwidthbyasmallintegerfactoratmost).• Therearetwobasicscenarios:
1. Afewnodesinjectingtrafficatfullratetowardsthesamedestination.
2. Manynodesinjectingtrafficatlowratestowardsthesamedestination.
• Egressportsofin-networkcongestedswitchesworkatfullcapacityandmaycontendwithotherflowsforupstreamswitches,movingtherootofcongestionupwards.
AnalysisofcongestionIn-NetworkCongestion
Root of Congestion
In-networkcongestion
• Traditionalapproaches:spreadtrafficflowsacrossthemultiplepathsinordertobalancetheloadandhopefullyavoidcongestion(loadbalancing).• Problems:
1. Spreadingtrafficdonottakeintoaccountwhethertheselectedpathiscongested,generatingcollisionsoftrafficflowsinpathsalready congested.
2. Thenatureofflowsmatters:elephantflowsincreasethechanceofcreatingin-networkcongestion.
3. Traditionalloadbalancing(e.g.ECMP)donotworkwhereincast congestionappear.
AnalysisofcongestionIn-NetworkCongestion
• Manynodesstarttosendpacketsatfullratetowardsthesamedestination,almostatthesametime(e.g.OLDIservices)• Incast congestionoccursattheToR switchwherethenodethatmultiplepartiesaresynchronizingwithisconnected,andgrowsfromToRswitchestodownstreamswitches.• InCLOSnetworksmanysmallcongestiontreesconcurrentlyappearatfirst-stageswitches,latermergingatsecond-stageswitchesandformingseverallargercongestiontrees.
AnalysisofcongestionIncast Congestion
Root of Congestion
Incast congestion
• Traditionalapproach:TheDCNnetworkequipmentsimplyreactstoincast usingECN+PFCandsmartbuffermanagementinanattempttominimizepacketloss.
• Problems:1. LargeDCNnetworkshave
morehops,increasingtheclosed-loopreactiontimeofECN.
2. MoretrafficinflightmakesitdifficultforECNtoreact untilsuddentrafficbursts.
3. PFCgeneratesHoL blocking inupstreamswitches
4. ECNmaybetriggeredatsourcesnotcontributingtocongestion
AnalysisofcongestionIncast Congestion
Non-congested fllows advance at
the same speed as congested ones
Congestion affectssources that do notcause congestion
PFC
AnalysisofcongestionProposedTechnologies
• Asmallnumberoflongdurationelephantflowscanaligninsuchawaytocreatequeuingdelaysforthelargernumberofshortbutcriticalmiceflows.• Traditionalloadbalancing(i.e.ECMP)andECN+PFCarenotenoughwhenin-networkandincast congestionappearinDCNnetworks.• In-networkcongestioncanbereducedbysuitably:
• Dimensioningnetworkbisectionbandwidth.• Applyingcleverbuffersorganization.• Usingsomeformofload-awaretrafficbalancingatthesources.
• Incast congestioncanbealleviatedbyusingdestinationscheduling.• Proposedtechnologieshavealsolimitationswhencongestionappears.
Paul Congdon et al: The Lossless Network for Data Centers. NENDICA “Network Enhancements for the Next Decade” Industry Connections Activity, IEEE Standards Association, 2018.
Outline
• Whycongestionisolationisneeded?• Analysisofcongestionscenarios• Limitationsofcurrenttechnologies• CongestionIsolationinDCNs• Conclusions
LimitationsofcurrenttechnologiesLoadbalancing
• Techniquetoavoidin-networkcongestion.• Ineffectiveapproachescanactuallydotheopposite.• Loadbalancingselectsapathbyhashingtheflowidentityfieldsintheroutedpacketsuchthatallpacketsfromaparticularflowtraversethesamepath.• EqualCostMulti-Path (ECMP)routing:Flowgranularityisaproblemthatmaygenerateelephantflowstotraverseandoccupyarouteinthenetworkforalongperiodoftime.
• ECNmechanismmayreduceinjectionrateofelephantflows,butduringtheclosed-looptransientperiodtheymayinterferewithmiceflows,slowingdowntheiradvance.
• Toovercometheseissues,severalideasfocusonreducinggranularityofflowstomakebetterloadbalancingdecisions,basedonmeasuringthecongestedpaths.• Thegranularityofloadbalancinghastrade-offsbetweentheuniformityofthedistributionandcomplexityassociatedwithassuringdataisdeliveredinitsoriginalorder.• Theyrequiresomeformofsignallingcongestiontothesources.• Balancingcongestedpacketsthroughalternativeroutesmayendupmovingcongestionrootsneartoendnodes,transformingin-networkcongestioninincast congestion
LimitationsofcurrenttechnologiesLoad-awarepacket-levelbalancing
Rocher-Gonzalez, J., Escudero-Sahuquillo, J., Garcia, P.J., Quiles, F. On the Impact of Routing Algorithms in the Effectiveness of Queuing Schemes in High-Performance Interconnection Networks. In Proc. of IEEE HoTI 2017.
In-networkcongestionina3-tierCLOS
Root of Congestion
Load-aware packet-balancingdecision based on
congestion indicators
Alternative path to
destination
X
Packets Reordering
LimitationsofcurrenttechnologiesLoad-awarepacket-levelbalancing
LimitationsofcurrenttechnologiesDestinationscheduling
• Thenetworkcanassistineliminatingpacketlossatthedestinationbyschedulingtrafficdeliverywhenitwouldotherwisebelost.• TraditionalTCPassessthebandwidthandresourceavailabilitybymeasuringfeedbackthroughacks (itworkswithlightload)
• Onceincast congestionappearsatthedestination,delaysincreaseandbuffersoverflow,throughputislostandlatencyrises.• TraditionalTCPcannotreactquickenoughtohandleincast
• Solution:Requestingdatafromthesourceataratethatitcanbeconsumedwithoutloss.
• Sourcesrequest (send)directlyasmallamountofunscheduleddatatotheirdestinations.• Destinationsscheduleagrant response,bymeansofACKs,whenresourcesareavailabletoreceivetheentiretransfer.• ThereisaRTTrequest-grantdelaythatmayincreaseduringincast situations.• Solution:Sourcesmonitorthelevelofcongestioninthenetwork(light,moderateandhigh)andscheduledatainjectionaccordingtothelevelofcongestion.
LimitationsofcurrenttechnologiesLoad-awaredestinationscheduling
• Itispossibletocombinedestinationschedulingandloadbalancing,dependingonwhetherincast orin-networkcongestionismonitored.• Sourcesmeasureifthecongestionislight,moderateorhigh,applyingdifferentinjectionrates.• Theideaistoworkinload-awarebalancingmodeuntilincast congestionappear.Whenthishappens,thenetworkswitchestodestination-schedulingmode.• ThefrequencyuseofPFCandECNisreduced
LimitationsofcurrenttechnologiesCombinedload-awaredestinationschedulingandbalancing
Incast congestionina3-tierCLOS
LimitationsofcurrenttechnologiesDestinationscheduling
Root of Congestion
ECN-marked packets received
a certain rateSources inject a
moderate amount of data and apply
load balancing
Moderate load
Light load
Sources schedule big amount of
data
High load
Sources inject a small amount of data (granted by
destination)
X
• Thesetechnologiesmayworktogethertoeliminatelossintheclouddatacenternetwork[1].• Load-balancinganddestinationschedulingareend-to-endsolutionsincurringintheRTTdelayswhencongestionappear• However,thereisnotimeforlossinthenetworkduetocongestionandcongestiontreesgrowveryquickly.• TransientcongestionmaystillproduceHoL blockingthatleadstoincreaselatency,lowerthroughputandbuffersoverflow,significantlydegradingperformance.• Evenusingthesemechanisms,westillneedsomethingtodealwithHOLBlockinglocallyandfast.
LimitationsofcurrenttechnologiesConsequences
Outline
• Whycongestionisolationisneeded?• Analysisofcongestionscenarios• Limitationsofcurrenttechnologies• CongestionIsolationinDCNs• Conclusions
CongestionIsolationinDCNsMotivation
• CIisneededtoreactlocallyandveryfasttoimmediatelyeliminateHoL blocking.• PrevioustechnologiesreducetheuseofPFCandECN,buttheirclosed- andopen-loopapproachcausedelaysstillhappening.• Congestiontreesappearsuddenly,aredifficulttopredict (evenworsewhenloadbalancingisapplied)andgrowquickly.• CIcanbeappliedincombinationtotheprevioustechnologies,improvingtheirbehavior.
• CIworkswellwhencombinedwithotherCCmechanisms(e.g.e2econgestioncontrol)[1].• Loadbalancingmakesitdifficulttopredictwhenandwherecongestionpointsarise.• DestinationSchedulinghasRTT-delaysthatmaymakefeedbackinformationobsoletebythetimeitreachesthesources.• CIwillcomplementthesetechnologiesworkingtogether,makingthembehavebetter.
CongestionIsolationinDCNsImprovementsoncurrenttechnologies
[1] Jesús Escudero-Sahuquillo, Ernst Gunnar Gran, Pedro Javier García, Jose Flich, Tor Skeie, OlavLysne, Francisco J. Quiles, José Duato: Combining Congested-Flow Isolation and Injection Throttlingin HPC Interconnection Networks. ICPP 2011: 662-672
• Loadbalancing:• CIcomplementsloadbalancingaslocalandfastcongestedflowsisolationreducestheHoL blockingprobabilitieswhenloadbalancingisappliedthroughouttheentirenetwork.• Betterdecisionsforloadbalancingcanbemadeoncethecongestedflowsareisolated.
• Destinationscheduling:• TransientperiodswheregrantsaresentfromdestinationstosourcescanbecomplementedwithCI.• FastandlocalisolationofcongestedflowsreduceRTT-delaysofgrants.
CongestionIsolationinDCNsImprovementsoncurrenttechnologies
• DotheotherscomplementCI?Yes,theymakepossibletokeeptheCIrequiredresourceslow.• CIrequireadditionalresourcestokeeptrackofcongestiontrees atswitches.• Ifthenumberofcongestionspotsgrows,switchesmayenduprunningoutofresourcestokeeptrackofthem.• LoadbalancinganddestinationschedulingstrategieswilldraincongestiontreesfasterthanusingPFC+ECN.• Theywillcomplement(andimprove)theCIbehavior.
CongestionIsolationinDCNsCurrenttechnologiesalsoimproveCI
Jesús Escudero-Sahuquillo, Ernst Gunnar Gran, Pedro Javier García, Jose Flich, Tor Skeie, Olav Lysne, Francisco J. Quiles, José Duato: Efficient and Cost-Effective Hybrid Congestion Control for HPC Interconnection Networks. IEEE Trans. Parallel Distrib. Syst. 26(1): 107-119(2015)
Outline
• Whycongestionisolationisneeded?• Analysisofcongestionscenarios• Limitationsofcurrenttechnologies• CongestionIsolationinDCNs• Conclusions
Conclusions
• ThereisalotofworkdoneinDCNstodealwithcongestionandHoL blocking.• Existingsolutionsworkend-to-end,sothattransientcongestionmaystillspoilnetworkperformance.• CIprovidesafastreactiontocongestionandHoLblocking.• Infact,CIcanworkincooperationwithotherapproachesproposedtodealwithcongestion,improvingtheirbehavior.• Inaddition,theproposedapproachescanalsoworkincooperationwithCI,increasingitsbenefits.• Itisveryinterestingtoexplorethesynergyofallthesetechniquesworkingtogether.
JesusEscudero-Sahuquillo,PedroJavierGarcia,FranciscoJ.Quiles,JoseDuato
IEEE802.1PlenaryMeeting,SanDiego,CA,USAJuly8,2018
P802.1Qczinterworkingwithotherdatacentertechnologies
Backupslides
ContentionswitchA
AnalysisofcongestionIn-NetworkCongestiontransitiontoIncast
• Consideranalreadyformedcongestiontreewhoserootislocatedatsomeintermediateswitchegressport,whichisconnectedtoadownstreamswitch
• Assumeanotherflowreachingthatdownstreamswitchthroughadifferentingressportanddestinedtothesamenodeastheflowsintheexistingtree
• Iftheaggregatedbandwidthrequiredbytherootofthecongestiontreeandtheadditionalflowexceedlinkbandwidth,congestionwillbedetectedatsomeegressportofdownstreamswitch
• Theexistingcongestiontreeismergedwithanewbranch,movingtherootofthecongestiontreetoanegressportofdownstreamswitchA
Congestion trees merge in downstream
direction
switchA
AnalysisofcongestionIn-NetworkCongestiontransitiontoIncast