Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
6.888:Lecture2
DataCenterNetworkArchitectures
MohammadAlizadeh
Spring2016
² SlidesadaptedfrompresentaDonsbyAlbertGreenbergandChanghoonKim(MicrosoJ)1
DataCenterCosts
Amor%zedCost*
Component Sub-Components
~45% Servers CPU,memory,disk
~25% Powerinfrastructure
UPS,cooling,powerdistribuDon
~15% Powerdraw ElectricaluDlitycosts
~15% Network Switches,links,transit
*3yramorDzaDonforservers,15yrforinfrastructure;5%costofmoney
TheCostofaCloud:ResearchProblemsinDataCenterNetworks.SigcommCCR2009.Greenberg,Hamilton,Maltz,Patel.
ServerCostsUglysecret:30%uDlizaDonconsidered“good”indatacenters
UnevenapplicaDonfit– EachserverhasCPU,memory,disk:mostapplicaDonsexhaustoneresource,strandingtheothers
LongprovisioningDmescales– Newserverspurchasedquarterlyatbest
Uncertaintyindemand– Demandforanewservicecanspikequickly
Riskmanagement– Nothavingspareserverstomeetdemandbringsfailurejustwhensuccessisathand
Sessionstateandstorageconstraints– Iftheworldwerestatelessservers,lifewouldbegood
3
Goal:Agility–Anyservice,AnyServer
Turntheserversintoasinglelargefungiblepool– Dynamicallyexpandandcontractservicefootprintasneeded
Benefits– IncreaseservicedeveloperproducDvity– Lowercost– Achievehighperformanceandreliability
The 3 motivators of most infrastructure projects
4
AchievingAgilityWorkloadmanagement
– Meansforrapidlyinstallingaservice’scodeonaserver– Virtualmachines,diskimages,containers
StorageManagement– Meansforaservertoaccesspersistentdata– Distributedfilesystems(e.g.,HDFS,blobstores)
Network– MeansforcommunicaDngwithotherservers,regardlessofwheretheyareinthedatacenter
5
ConvenDonalDCNetwork
Reference–“DataCenter:LoadbalancingDataCenterServices”,Cisco2004
CR CR
AR AR AR AR...
SS
DC-Layer3
Internet
SS
A AA …
SS
A AA …
...
DC-Layer2Key
• CR=CoreRouter(L3)• AR=AccessRouter(L3)• S=EthernetSwitch(L2)• A=Rackofapp.servers
~1,000servers/pod==IPsubnet
6
Layer 2 vs. Layer 3Ethernet switching (layer 2)
ü Fixed IP addresses and auto-configuration (plug & play) ü Seamless mobility, migration, and failover x Broadcast limits scale (ARP) x Spanning Tree Protocol
IP routing (layer 3) ü Scalability through hierarchical addressing ü Multipath routing through equal-cost multipath x More complex configuration x Can’t migrate w/o changing IP address
7
ConvenDonalDCNetworkProblemsCR CR
AR AR AR AR
SS
SS
A AA …
SS
A AA …
...
SS
SS
A AA …
SS
A AA …
~5:1
~40:1
~200:1
Dependenceonhigh-costproprietaryroutersExtremelylimitedserver-to-servercapacity
8
AndMoreProblems…CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
IPsubnet(VLAN)#1
~200:1
• ResourcefragmentaDon,significantlyloweringclouduDlizaDon(andcost-efficiency)
IPsubnet(VLAN)#2
A AA … A AA … A A… AA …AA A
9
AndMoreProblems…CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
IPsubnet(VLAN)#1
~200:1
• ResourcefragmentaDon,significantlyloweringclouduDlizaDon(andcost-efficiency)
ComplicatedmanualL2/L3re-configura%on
IPsubnet(VLAN)#2
A AA … A AA … A A… AA …AA A
10
Measurements
11
DCTrafficCharacterisDcsInstrumentedalargeclusterusedfordataminingandidenDfieddisDncDvetrafficpamerns
Trafficpamernsarehighlyvola%le– AlargenumberofdisDncDvepamernseveninaday
Trafficpamernsareunpredictable– CorrelaDonbetweenpamernsveryweak
Traffic-awareop%miza%onneedstobedonefrequentlyandrapidly
12
DCOpportuniDesDCcontrollerknowseverythingabouthosts
HostOS’sareeasilycustomizable
Probabilis%cflowdistribuDonwouldworkwellenough,because…
– Flowsarenumerousandnothuge–noelephants– Commodityswitch-to-switchlinksaresubstanDallythicker(~10x)thanthemaximumthicknessofaflow
DCnetworkcanbemadesimple
??
13
IntuiDon
Higherspeedlinksimproveflow-levelloadbalancing(ECMP)
14
20×10GbpsUplinks
2×100GbpsUplinks
11×10Gbpsflows(55%load)
1 2
1 2 20
Probof100%throughput=3.27%
Probof100%throughput=99.95%
WhatYouSaid
“In3.2,thepaperstatesthatrandomizinglargeflowswon'tcausemuchperpetualcongesDonifmisplacedsincelargeflowsareonly100MBandthustake1secondtotransmitona1Gbpslink.Isn't1secondsufficientlyhightoharmtheisolaDonthatVL2triestoprovide?”
15
VirtualLayer2Switch
16
1.L2seman%cs
2.Uniformhighcapacity
3.Performanceisola%on
A AA … A AA … A AA … A AA …AAAA AAAA AAAA A A A A AA A AA AA AA
17
VL2Goals
VL2DesignPrinciplesRandomizingtoCopewithVolaDlity
– Tremendousvariabilityintrafficmatrices
SeparaDngNamesfromLocaDons– Anyserver,anyservice
EmbracingEndSystems– Leveragetheprogrammability&resourcesofservers– Avoidchangestoswitches
BuildingonProvenNetworkingTechnology– Buildwithpartsshippingtoday– Leveragelowcost,powerfulmerchantsiliconASICs,thoughdonotrelyonanyonevendor
Single-Chip“MerchantSilicon”Switches
19
Wedge
6pack
SwitchASIC
² ImagecourtesyofFacebook
SpecificObjecDvesandSoluDonsSolu%onApproachObjec%ve
2.Uniformhighcapacitybetweenservers
Enforcehosemodelusingexis%ng
mechanismsonly
Employflataddressing
1.Layer-2seman%cs
3.PerformanceIsola%on
Guaranteebandwidthfor
hose-modeltraffic
Flow-basedrandomtrafficindirec%on
(ValiantLB)
Name-loca%onsepara%on&
resolu%onservice
TCP
20
Discussion
21
WhatYouSaid
“ItisinteresDngthatthispaperisfrom2009.ItseemsthatalargenumberofthesuggesDonsinthispaperareusedinpracDcetoday.”
22
WhatYouSaid
“ForaddressresoluDon,whynothaveapplicaDonsusehostnamesanduseDNStoresolvehostnamestoIPaddresses(themappingfromhostnametoIPcouldbeupdatedwhenaservicemoved)?IsthedirectorysystembasicallyjustDNSbutwithIPsinsteadofhostnames?”“itwasunclearwhythehashofthe5tupleisrequired.”
23
AddressingandRouDng:Name-LocaDonSeparaDon
payloadToR3
... ...
yx
Serversuseflatnames
Switchesrunlink-staterou%ngandmaintainonlyswitch-leveltopology
Copewithhostchurnswithverylieleoverhead
y zpayloadToR4 z
ToR2 ToR4ToR1 ToR3
y,zpayloadToR3 z
...
DirectoryService
…xàToR2yàToR3zàToR4
…
Lookup&Response
…xàToR2yàToR3zàToR3
…
24
AddressingandRouDng:Name-LocaDonSeparaDon
payloadToR3
... ...
yx
Serversuseflatnames
Switchesrunlink-staterou%ngandmaintainonlyswitch-leveltopology
Copewithhostchurnswithverylieleoverhead
y zpayloadToR4 z
ToR2 ToR4ToR1 ToR3
y,zpayloadToR3 z
...
DirectoryService
…xàToR2yàToR3zàToR4
…
Lookup&Response
…xàToR2yàToR3zàToR3
…
• Allowstouselow-costswitches• Protectsnetworkandhostsfromhost-statechurn• Obviateshostandswitchreconfigura%on
25
ExampleTopology:ClosNetwork
...
...
TOR
20Servers
Int
... ......
Aggr
KaggrswitcheswithDports
20*(DK/4)Servers... ........
Offerhugeaggrcapacityandmul%pathsatmodestcost
26
ExampleTopology:ClosNetwork
...
...
TOR
20Servers
Int
... ......
Aggr
KaggrswitcheswithDports
20*(DK/4)Servers... ........
Offerhugeaggrcapacityandmul%pathsatmodestcost
D(#of10Gports)
MaxDCsize(#ofServers)
48 11,52096 46,080144 103,680
27
TrafficForwarding:RandomIndirecDon
x y
payloadT3 y
z
payloadT5 z
IANYIANYIANY
IANY
CopewitharbitraryTMswithverylieleoverhead
Linksusedforuppaths
Linksusedfordownpaths
T1 T2 T3 T4 T5 T6
28
TrafficForwarding:RandomIndirecDon
x y
payloadT3 y
z
payloadT5 z
IANYIANYIANY
IANY
CopewitharbitraryTMswithverylieleoverhead
Linksusedforuppaths
Linksusedfordownpaths
T1 T2 T3 T4 T5 T6
[ECMP+IPAnycast]• Harnesshugebisec%onbandwidth• Obviateesoterictrafficengineeringorop%miza%on• Ensurerobustnesstofailures• Workwithswitchmechanismsavailabletoday
29
Whatyousaid
“…theheterogeneityofracksandtheincrementaldeploymentofnewracksmayintroduceasymmetrytothetopology.Inthiscase,moredelicatetopologydesignandrouDngalgorithmsareneeded.”
30
SomeotherDCnetworkdesigns…
31
Fat-tree[SIGCOMM’08]
Jellyfish(random)[NSDI’12]
BCube[SIGCOMM’10]
NextDme:CongesDonControl
32
33