Upload
hoangthien
View
215
Download
0
Embed Size (px)
Citation preview
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
TableofContentsIntroduction
Disclaimer
Abouttheauthor
IntroductiontoHA
ComponentsofHA
FundamentalConcepts
RestartingVirtualMachines
VirtualSANandVirtualVolumesspecifics
AddingresiliencytoHA
AdmissionControl
VMandApplicationMonitoring
vSphereHAand...
UseCase-StretchedClusters
AdvancedSettings
Summarizing
Changelog
vSphere6.xHADeepdive
2
VMwarevSphere6.xHADeepdiveLikemanyofyouIamconstantlytryingtoexplorenewwaystosharecontentwiththerestoftheworld.OverthecourseofthelastdecadeIhavedonethisinmanydifferentformats,someofthemwereeasytodoandothersnotsomuch.Booksalwaysfellinthatlastcategory,whichisashameasIhavealwaysenjoyedwritingthem.
Iwantedtoexplorethedifferentoptionstherearetocreatecontentandshareitindifferentways,withouttheneedtore-doformattingandwastealotoftimeonthingsIdonotwanttowastetimeon.AfteranafternoonofreadingandresearchingGitBookpoppedup.Itlookedlikeaninterestingplatform/solutionthatwouldallowmetocreatecontentbothonlineandoffline,pushandpullittoandfromarepositoryandbuildbothastaticwebsitefromitaswellaspublishitinavarietyofdifferentformats.
Letitbeclearthatthisisatrial,andthismayormaynotresultinafollowup.IamstartingwiththevSphereHighAvailabilitycontentasthatiswhatIammostfamiliarwithandwillbeeasiesttoupdate.
Aspecialthanksgoesouttoeveryonewhohascontributedinanyshapeorformtothisproject.FirstofallFrankDenneman,thepersonwhomIwrotethefirst3versionsoftheClusteringDeepdivewithandwhodesignedallthegreatdiagramswhichyoufindthroughoutthispublication.Ofcoursealso:DougBaerforeditingthecontentinthepastandmytechnicalconscious:KeithFarkas,CormacHogan,ManojKrishnan,AnneHoller,MustafaUysalandGabrielTarasuk-Levin.
Forofflinereading,feelfreetodownloadthispublicationinanyofthefollowingformats:PDF-ePub-Mobi.
ThesourceofthispublicationisstoredonbothGitbookaswellasGithub.Feelfreetosubmit/contributewherepossibleandneeded.Notethatitisalsopossibletoleavefeedbackonthecontentbysimplyclickingonthe"+"ontherightsideoftheparagraphyouwanttocommenton(hoveroveritwithyourmouse).IwillreadandincorporatefeedbackassoonasIhavetime,henceitisusefultocheckbackregularlyandvalidateyourdownloadedversionagainstthedetailsbelow.
vSphere6.xHADeepdive,bookversion:1.0.4.BookbuiltwithGitBookversion:2.6.7.
Thanksforreading,andenjoy!
vSphere6.xHADeepdive
3Introduction
DuncanEppingChiefTechnologistStorageandAvailability-VMware
vSphere6.xHADeepdive
4Introduction
DisclaimerAlthougheveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthorassumenoresponsibilityforerrorsoromissions.Neitherisanyliabilityassumedfordamagesresultingfromtheuseoftheinformationcontainedherein.
TheauthorofthispublicationworksforVMware.Theopinionsexpressedhereistheauthor'spersonalopinion.ContentpublishedwasnotapprovedinadvancebyVMwareanddoesnotnecessarilyreflecttheviewsandopinionofVMware.Thisistheauthor'sbook,notaVMware.
Copyrights/Licensing
Figure1-CreativeCommonsLicense
vSphere6.xHADeepdive
5Disclaimer
AbouttheAuthorDuncanEppingisaChiefTechnologistworkingintheOfficeofCTOofVMware'sStorageandAvailabilitybusinessunit.Inthatrole,heservesasapartnerandtrustedadvisertoVMware’scustomersprimarilyinEMEA.MainresponsibilitiesareensuringVMware’sfutureinnovationsalignwithessentialcustomerneedsandtranslatingcustomerproblemstoopportunities.DuncanspecializesinSoftwareDefinedStorage,hyper-convergedinfrastructuresandbusinesscontinuity/disasterrecoverysolutions.Hehas1patentgrantedand4patentspendingonthetopicofavailability,storageandresourcemanagement.DuncanisaVMwareCertifiedDesignExpert(VCDX007)andthemainauthorandownerofVMware/VirtualizationblogYellow-Bricks.com.
Hecanbefollowedontwitter@DuncanYB.
vSphere6.xHADeepdive
6Abouttheauthor
IntroductiontovSphereHighAvailabilityAvailabilityhastraditionallybeenoneofthemostimportantaspectswhenprovidingservices.WhenprovidingservicesonasharedplatformlikeVMwarevSphere,theimpactofdowntimeexponentiallygrowsasmanyservicesrunonasinglephysicalmachine.AssuchVMwareengineeredafeaturecalledVMwarevSphereHighAvailability.VMwarevSphereHighAvailability,hereaftersimplyreferredtoasHA,providesasimpleandcosteffectivesolutiontoincreaseavailabilityforanyapplicationrunninginavirtualmachineregardlessofitsoperatingsystem.ItisconfiguredusingacoupleofsimplestepsthroughvCenterServer(vCenter)andassuchprovidesauniformandsimpleinterface.HAenablesyoutocreateaclusteroutofmultipleESXihosts.Thiswillallowyoutoprotectvirtualmachinesandtheirworkloads.Intheeventofafailureofoneofthehostsinthecluster,impactedvirtualmachinesareautomaticallyrestartedonotherESXihostswithinthatsameVMwarevSphereCluster(cluster).
Figure2-HighAvailabilityinaction
Ontopofthat,inthecaseofaGuestOSlevelfailure,HAcanrestartthefailedGuestOS.ThisfeatureiscalledVMMonitoring,butissometimesalsoreferredtoasVM-HA.Thismightsoundfairlycomplexbutagaincanbeimplementedwithasingleclick.
vSphere6.xHADeepdive
7IntroductiontoHA
Figure3-OSLevelHAjustasingleclickaway
Unlikemanyotherclusteringsolutions,HAisasimplesolutiontoimplementandliterallyenabledwithin5clicks.Ontopofthat,HAiswidelyadoptedandusedinallsituations.However,HAisnota1:1replacementforsolutionslikeMicrosoftClusteringServices/WindowsServerFailoverClustering(WSFC).ThemaindifferencebetweenWSFCandHAbeingthatWSFCwasdesignedtoprotectstatefulcluster-awareapplicationswhileHAwasdesignedtoprotectanyvirtualmachineregardlessofthetypeofworkloadwithin,butalsocanbeextendedtotheapplicationlayerthroughtheuseofVMandApplicationMonitoring.
InthecaseofHA,afail-overincursdowntimeasthevirtualmachineisliterallyrestartedononeoftheremaininghostsinthecluster.WhereasMSCStransitionstheservicetooneoftheremainingnodesintheclusterwhenafailureoccurs.Incontrarytowhatmanybelieve,WSFCdoesnotguaranteethatthereisnodowntimeduringatransition.Ontopofthat,yourapplicationneedstobecluster-awareandstatefulinordertogetthemostoutofthismechanism,whichlimitsthenumberofworkloadsthatcouldreallybenefitfromthistypeofclustering.
OnemightaskwhywouldyouwanttouseHAwhenavirtualmachineisrestartedandserviceistemporarilylost.Theanswerissimple;notallvirtualmachines(orservices)need99.999%uptime.FormanyservicesthetypeofavailabilityHAprovidesismorethansufficient.Ontopofthat,manyapplicationswereneverdesignedtorunontopofanWSFCcluster.ThismeansthatthereisnoguaranteeofavailabilityordataconsistencyifanapplicationisclusteredwithWSFCbutisnotcluster-aware.
Inaddition,WSFCclusteringcanbecomplexandrequiresspecialskillsandtraining.Oneexampleismanagingpatchesandupdates/upgradesinaWSFCenvironment;thiscouldevenleadtomoredowntimeifnotoperatedcorrectlyanddefinitelycomplicatesoperationalprocedures.HAhoweverreducescomplexity,costs(associatedwithdowntimeandMSCS),resourceoverheadandunplanneddowntimeforminimaladditionalcosts.ItisimportanttonotethatHA,contrarytoWSFC,doesnotrequireanychangestotheguestasHAisprovidedonthehypervisorlevel.Also,VMMonitoringdoesnotrequireanyadditionalsoftwareorOSmodificationsexceptforVMwareTools,whichshouldbeinstalledanywayasabestpractice.Incaseevenhigheravailabilityisrequired,VMwarealsoprovidesalevelof
vSphere6.xHADeepdive
8IntroductiontoHA
applicationawarenessthroughApplicationMonitoring,whichhasbeenleveragedbypartnerslikeSymantectoenableapplicationlevelresiliencyandcouldbeusedbyin-housedevelopmentteamstoincreaseresiliencyfortheirapplication.
HAhasprovenitselfoverandoveragainandiswidelyadoptedwithintheindustry;ifyouarenotusingittoday,hopefullyyouwillbeconvincedafterreadingthissectionofthebook.
vSphere6.0BeforewediveintothemainconstructsofHAanddescribeallthechoicesonehastomakewhenconfiguringHA,wewillfirstbrieflytouchonwhat’snewinvSphere6.0anddescribethebasicrequirementsandstepsneededtoenableHA.ThisbookcoversallthereleasedversionsofwhatisknownwithinVMwareas“FaultDomainManager”(FDM)whichwasintroducedwithvSphere5.0.Wewillcalloutthedifferencesinbehaviorinthedifferentversionswhereapplicable,ourbaselinehoweverisvSphere6.0.
What’sNewin6.0?
ComparedtovSphere5.0thechangesintroducedwithvSphere6.0forHAappeartobeminor.However,someofthenewfunctionalitywillmakethelifeofmanyofyoumucheasier.Althoughthelistisrelativelyshort,fromanengineeringpointofviewmanyofthesethingshavebeenanenormouseffortastheyrequiredchangetothedeepfundamentsoftheHAarchitecture.
SupportforVirtualVolumes–WithVirtualVolumesanewtypeofstorageentityisintroducedinvSphere6.0.ThishasalsoresultedinsomechangesintheHAarchitecturetoaccommodateforthisnewwayofstoringvirtualmachinesSupportforVirtualSAN–ThiswasactuallyintroducedwithvSphere5.5,butasitisnewtomanyofyouandledtochangesinthearchitecturewedecidedtoincludeitinthisupdateVMComponentProtection–ThisallowsHAtorespondtoascenariowheretheconnectiontothevirtualmachine’sdatastoreisimpactedtemporarilyorpermanently
HA“ResponseforDatastorewithAllPathsDown”HA“ResponseforDatastorewithPermanentDeviceLoss”
Increasedhostscale–Clusterlimithasgrownfrom32to64hostsIncreasedVMscale–Clusterlimithasgrownfrom4000VMsto8000VMsperclusterSecureRPC–SecurestheVM/AppmonitoringchannelFullIPv6supportRegistrationof“HADisabled”VMsonhostsafterfailure
vSphere6.xHADeepdive
9IntroductiontoHA
WhatisrequiredforHAtoWork?EachfeatureorproducthasveryspecificrequirementsandHAisnodifferent.KnowingtherequirementsofHAispartofthebasicswehavetocoverbeforedivingintosomeofthemorecomplexconcepts.ForthosewhoarecompletelynewtoHA,wewillalsoshowyouhowtoconfigureit.
Prerequisites
BeforeenablingHAitishighlyrecommendvalidatingthattheenvironmentmeetsalltheprerequisites.Wehavealsoincludedrecommendationsfromaninfrastructureperspectivethatwillenhanceresiliency.
Requirements:
MinimumoftwoESXihostsMinimumof5GBmemoryperhosttoinstallESXiandenableHAVMwarevCenterServerSharedStorageforvirtualmachinesPingablegatewayorotherreliableaddress
Recommendation:
RedundantManagementNetwork(notarequirement,buthighlyrecommended)8GBofmemoryormoreperhostMultipleshareddatastores
FirewallRequirements
ThefollowingtablecontainstheportsthatareusedbyHAforcommunication.Ifyourenvironmentcontainsfirewallsexternaltothehost,ensuretheseportsareopenedforHAtofunctioncorrectly.HAwillopentherequiredportsontheESXorESXifirewall.
Port Protocol Direction
8182 UDP Inbound
8182 TCP Inbound
8182 UDP Outbound
8182 TCP Outbound
ConfiguringvSphereHighAvailability
vSphere6.xHADeepdive
10IntroductiontoHA
HAcanbeconfiguredwiththedefaultsettingswithinacoupleofclicks.ThefollowingstepswillshowyouhowtocreateaclusterandenableHA,includingVMMonitoring,usingthevSphereWebClient.Eachofthesettingsandthedesigndecisionsassociatedwiththesestepswillbedescribedinmoredepthinthefollowingchapters.
1. Click“Hosts&Clusters”underInventoriesontheHometab.2. Right-clicktheDatacenterintheInventorytreeandclickNewCluster.3. Givethenewclusteranappropriatename.Werecommendataminimumincludingthe
locationoftheclusterandasequencenumberie.ams-hadrs-001.4. SelectTurnOnvSphereHA.5. Ensure“Enablehostmonitoring”and“Enableadmissioncontrol”isselected.6. Select“Percentageofclusterresources…”underPolicyandspecifyapercentage.7. EnableVMMonitoringStatusbyselecting“VMandApplicationMonitoring”.8. Click“OK”tocompletethecreationofthecluster.
Figure4-ReadytocompletetheNewClusterWizard
vSphere6.xHADeepdive
11IntroductiontoHA
WhentheHAclusterhasbeencreated,theESXihostscanbeaddedtotheclustersimplybyrightclickingthehostandselecting“MoveTo”,iftheywerealreadyaddedtovCenter,orbyrightclickingtheclusterandselecting“AddHost”.
WhenanESXihostisaddedtothenewly-createdcluster,theHAagentwillbeloadedandconfigured.Oncethishascompleted,HAwillenableprotectionoftheworkloadsrunningonthisESXihost.
Aswehaveclearlydemonstrated,HAisasimpleclusteringsolutionthatwillallowyoutoprotectvirtualmachinesagainsthostfailureandoperatingsystemfailureinliterallyminutes.UnderstandingthearchitectureofHAwillenableyoutoreachthatextra9whenitcomestoavailability.ThefollowingchapterswilldiscussthearchitectureandfundamentalconceptsofHA.Wewillalsodiscussalldecision-makingmomentstoensureyouwillconfigureHAinsuchawaythatitmeetstherequirementsofyouroryourcustomer’senvironment.
vSphere6.xHADeepdive
12IntroductiontoHA
ComponentsofHighAvailabilityNowthatweknowwhatthepre-requisitesareandhowtoconfigureHAthenextstepswillbedescribingwhichcomponentsformHA.Keepinmindthatthisisstilla“highlevel”overview.Thereismoreunderthecoverthatwewillexplaininfollowingchapters.Thefollowingdiagramdepictsatwo-hostclusterandshowsthekeyHAcomponents.
Figure5-ComponentsofHighAvailability
Asyoucanclearlysee,therearethreemajorcomponentsthatformthefoundationforHAasofvSphere6.0:
FDMHOSTDvCenter
ThefirstandprobablythemostimportantcomponentthatformsHAisFDM(FaultDomainManager).ThisistheHAagent.
vSphere6.xHADeepdive
13ComponentsofHA
TheFDMAgentisresponsibleformanytaskssuchascommunicatinghostresourceinformation,virtualmachinestatesandHApropertiestootherhostsinthecluster.FDMalsohandlesheartbeatmechanisms,virtualmachineplacement,virtualmachinerestarts,loggingandmuchmore.Wearenotgoingtodiscussallofthisin-depthseparatelyaswefeelthatthiswillcomplicatethingstoomuch.
FDM,inouropinion,isoneofthemostimportantagentsonanESXihost,whenHAisenabled,ofcourse,andweareassumingthisisthecase.TheengineersrecognizedthisimportanceandaddedanextralevelofresiliencytoHA.FDMusesasingle-processagent.However,FDMspawnsawatchdogprocess.Intheunlikelyeventofanagentfailure,thewatchdogfunctionalitywillpickuponthisandrestarttheagenttoensureHAfunctionalityremainswithoutanyoneevernoticingitfailed.Theagentisalsoresilienttonetworkinterruptionsand“allpathsdown”(APD)conditions.Inter-hostcommunicationautomaticallyusesanothercommunicationpath(ifthehostisconfiguredwithredundantmanagementnetworks)inthecaseofanetworkfailure.
HAhasnodependencyonDNSasitworkswithIPaddressesonly.ThisisoneofthemajorimprovementsthatFDMbrought.ThisdoesnotmeanthatESXihostsneedtoberegisteredwiththeirIPaddressesinvCenter;itisstillabestpracticetoregisterESXihostsbyitsfullyqualifieddomainname
(FQDN)invCenter.AlthoughHAdoesnotdependonDNS,rememberthatotherservicesmaydependonit.Ontopofthat,monitoringandtroubleshootingwillbemucheasierwhenhostsarecorrectlyregisteredwithinvCenterandhaveavalidFQDN.
Basicdesignprinciple:AlthoughHAisnotdependentonDNS,itisstillrecommendedtoregisterthehostswiththeirFQDNforeaseofoperations/management.
vSphereHAalsohasastandardizedloggingmechanism,whereasinglelogfilehasbeencreatedforalloperationallogmessages;itiscalledfdm.log.Thislogfileisstoredunder/var/log/asdepictedinFigure5.
Figure6-HAlogfile
vSphere6.xHADeepdive
14ComponentsofHA
Basicdesignprinciple:Ensuresyslogiscorrectlyconfiguredandlogfilesareoffloadedtoasafelocationtoofferthepossibilityofperformingarootcauseanalysisincasedisasterstrikes.
HOSTDAgentOneofthemostcrucialagentsonahostisHOSTD.Thisagentisresponsibleformanyofthetaskswetakeforgrantedlikepoweringonvirtualmachines.FDMtalksdirectlytoHOSTDandvCenter,soitisnotdependentonVPXA,likeinpreviousreleases.Thisis,ofcourse,toavoidanyunnecessaryoverheadanddependencies,makingHAmorereliablethaneverbeforeandenablingHAtorespondfastertopower-onrequests.ThatultimatelyresultsinhigherVMuptime.
When,forwhateverreason,HOSTDisunavailableornotyetrunningafterarestart,thehostwillnotparticipateinanyFDM-relatedprocesses.FDMreliesonHOSTDforinformationaboutthevirtualmachinesthatareregisteredtothehost,andmanagesthevirtualmachinesusingHOSTDAPIs.Inshort,FDMisdependentonHOSTDandifHOSTDisnotoperational,FDMhaltsallfunctionsandwaitsforHOSTDtobecomeoperational.
vCenterThatbringsustoourfinalcomponent,thevCenterServer.vCenteristhecoreofeveryvSphereClusterandisresponsibleformanytasksthesedays.Forourpurposes,thefollowingarethemostimportantandtheoneswewilldiscussinmoredetail:
DeployingandconfiguringHAAgentsCommunicationofclusterconfigurationchangesProtectionofvirtualmachines
vCenterisresponsibleforpushingouttheFDMagenttotheESXihostswhenapplicable.Thepushoftheseagentsisdoneinparalleltoallowforfasterdeploymentandconfigurationofmultiplehostsinacluster.vCenterisalsoresponsibleforcommunicatingconfigurationchangesintheclustertothehostwhichiselectedasthemaster.Wewilldiscussthisconceptofmasterandslavesinthefollowingchapter.Examplesofconfigurationchangesaremodificationoradditionofanadvancedsettingortheintroductionofanewhostintothecluster.
HAleveragesvCentertoretrieveinformationaboutthestatusofvirtualmachinesand,ofcourse,vCenterisusedtodisplaytheprotectionstatus(Figure6)ofvirtualmachines.(What“virtualmachineprotection”actuallymeanswillbediscussedinchapter3.)Ontopofthat,vCenterisresponsiblefortheprotectionandunprotectionofvirtualmachines.Thisnotonly
vSphere6.xHADeepdive
15ComponentsofHA
appliestouserinitiatedpower-offsorpower-onsofvirtualmachines,butalsointhecasewhereanESXihostisdisconnectedfromvCenteratwhichpointvCenterwillrequestthemasterHAagenttounprotecttheaffectedvirtualmachines.
Figure7-Virtualmachineprotectionstate
AlthoughHAisconfiguredbyvCenterandexchangesvirtualmachinestateinformationwithHA,vCenterisnotinvolvedwhenHArespondstofailure.ItiscomfortingtoknowthatincaseofahostfailurecontainingthevirtualizedvCenterServer,HAtakescareofthefailureandrestartsthevCenterServeronanotherhost,includingallotherconfiguredvirtualmachinesfromthatfailedhost.
ThereisacornercasescenariowithregardstovCenterfailure:iftheESXihostsaresocalled“statelesshosts”andDistributedvSwitchesareusedforthemanagementnetwork,virtualmachinerestartswillnotbeattempteduntilvCenterisrestarted.Forstatelessenvironments,vCenterandAutoDeployavailabilityiskeyastheESXihostsliterallydependonthem.
IfvCenterisunavailable,itwillnotbepossibletomakechangestotheconfigurationofthecluster.vCenteristhesourceoftruthforthesetofvirtualmachinesthatareprotected,theclusterconfiguration,thevirtualmachine-to-hostcompatibilityinformation,andthehostmembership.So,whileHA,bydesign,willrespondtofailureswithoutvCenter,HAreliesonvCentertobeavailabletoconfigureormonitorthecluster.
WhenavirtualvCenterServer,orthevCenterServerAppliance,hasbeenimplemented,werecommendsettingthecorrectHArestartprioritiesforit.AlthoughvCenterServerisnotrequiredtorestartvirtualmachines,therearemultiplecomponentsthatrelyonvCenterand,assuch,aspeedyrecoveryisdesired.WhenconfiguringyourvCentervirtualmachinewitha
vSphere6.xHADeepdive
16ComponentsofHA
highpriorityforrestarts,remembertoincludeallservicesonwhichyourvCenterserverdependsforasuccessfulrestart:DNS,MSADandMSSQL(oranyotherdatabaseserveryouareusing).
Basicdesignprinciples:
1. Instatelessenvironments,ensurevCenterandAutoDeployarehighlyavailableasrecoverytimeofyourvirtualmachinesmightbedependentonthem.
2. UnderstandtheimpactofvirtualizingvCenter.EnsureithashighpriorityforrestartsandensurethatserviceswhichvCenterServerdependsonareavailable:DNS,ADanddatabase.
vSphere6.xHADeepdive
17ComponentsofHA
FundamentalConceptsNowthatyouknowaboutthecomponentsofHA,itistimetostarttalkingaboutsomeofthefundamentalconceptsofHAclusters:
Master/SlaveagentsHeartbeatingIsolatedvsNetworkpartitionedVirtualMachineProtectionComponentProtection
EveryonewhohasimplementedvSphereknowsthatmultiplehostscanbeconfiguredintoacluster.Aclustercanbestbeseenasacollectionofresources.TheseresourcescanbecarvedupwiththeuseofvSphereDistributedResourceScheduler(DRS)intoseparatepoolsofresourcesorusedtoincreaseavailabilitybyenablingHA.
TheHAarchitectureintroducestheconceptofmasterandslaveHAagents.Exceptduringnetworkpartitions,whicharediscussedlater,thereisonlyonemasterHAagentinacluster.Anyagentcanserveasamaster,andallothersareconsidereditsslaves.Amasteragentisinchargeofmonitoringthehealthofvirtualmachinesforwhichitisresponsibleandrestartinganythatfail.Theslavesareresponsibleforforwardinginformationtothemasteragentandrestartinganyvirtualmachinesatthedirectionofthemaster.TheHAagent,regardlessofitsroleasmasterorslave,alsoimplementstheVM/AppmonitoringfeaturewhichallowsittorestartvirtualmachinesinthecaseofanOperatingSystemorrestartservicesinthecaseofanapplicationfailure.
MasterAgentAsstated,oneoftheprimarytasksofthemasteristokeeptrackofthestateofthevirtualmachinesitisresponsibleforandtotakeactionwhenappropriate.Inanormalsituationthereisonlyasinglemasterinacluster.Wewilldiscussthescenariowheremultiplemasterscanexistinasingleclusterinoneofthefollowingsections,butfornowlet’stalkaboutaclusterwithasinglemaster.Amasterwillclaimresponsibilityforavirtualmachinebytaking“ownership”ofthedatastoreonwhichthevirtualmachine’sconfigurationfileisstored.
vSphere6.xHADeepdive
18FundamentalConcepts
Basicdesignprinciple:Tomaximizethechanceofrestartingvirtualmachinesafterafailurewerecommendmaskingdatastoresonaclusterbasis.Althoughsharingofdatastoresacrossclusterswillwork,itwillincreasecomplexityfromanadministrativeperspective.
Thatisnotall,ofcourse.TheHAmasterisalsoresponsibleforexchangingstateinformationwithvCenter.ThismeansthatitwillnotonlyreceivebutalsosendinformationtovCenterwhenrequired.TheHAmasterisalsothehostthatinitiatestherestartofvirtualmachineswhenahosthasfailed.Youmayimmediatelywanttoaskwhathappenswhenthemasteristheonethatfails,or,moregenerically,whichofthehostscanbecomethemasterandwhenisitelected?
Election
AmasteriselectedbyasetofHAagentswhenevertheagentsarenotinnetworkcontactwithamaster.AmasterelectionthusoccurswhenHAisfirstenabledonaclusterandwhenthehostonwhichthemasterisrunning:
fails,becomesnetworkpartitionedorisolated,isdisconnectedfromvCenterServer,isputintomaintenanceorstandbymode,orwhenHAisreconfiguredonthehost.
TheHAmasterelectiontakesapproximately15secondsandisconductedusingUDP.WhileHAwon’treacttofailuresduringtheelection,onceamasteriselected,failuresdetectedbeforeandduringtheelectionwillbehandled.Theelectionprocessissimplebutrobust.Thehostthatisparticipatingintheelectionwiththegreatestnumberofconnecteddatastoreswillbeelectedmaster.Iftwoormorehostshavethesamenumberofdatastoresconnected,theonewiththehighestManagedObjectIdwillbechosen.Thishoweverisdonelexically;meaningthat99beats100as9islargerthan1.Foreachhost,theHAStateofthehostwillbeshownontheSummarytab.Thisincludestheroleasdepictedinscreenshotbelowwherethehostisamasterhost.
Afteramasteriselected,eachslavethathasmanagementnetworkconnectivitywithitwillsetupasinglesecure,encrypted,TCPconnectiontothemaster.ThissecureconnectionisSSL-based.Onethingtostressherethoughisthatslavesdonotcommunicatewitheachotherafterthemasterhasbeenelectedunlessare-electionofthemasterneedstotakeplace.
vSphere6.xHADeepdive
19FundamentalConcepts
Figure8-MasterAgent
Asstatedearlier,whenamasteriselecteditwilltrytoacquireownershipofallofthedatastoresitcandirectlyaccessoraccessbyproxyingrequeststooneoftheslavesconnectedtoitusingthemanagementnetwork.Forregularstoragearchitecturesitdoesthisbylockingafilecalled“protectedlist”thatisstoredonthedatastoresinanexistingcluster.Themasterwillalsoattempttotakeownershipofanydatastoresitdiscoversalongtheway,anditwillperiodicallyretryanyitcouldnottakeownershipofpreviously.
Thenamingformatandlocationofthisfileisasfollows:
/<rootofdatastore>/.vSphere-HA/<cluster-specific-directory>/protectedlist
Forthosewonderinghow“cluster-specific-directory”isconstructed:
<uuidofvCenterServer>-<numberpartoftheMoIDofthecluster>-<random8charstring>-
<nameofthehostrunningvCenterServer>
Themasterusesthisprotectedlistfiletostoretheinventory.ItkeepstrackofwhichvirtualmachinesareprotectedbyHA.Callingitaninventorymightbeslightlyoverstating:itisalistofprotectedvirtualmachinesanditincludesinformationaroundvirtualmachineCPUreservationandmemoryoverhead.Themasterdistributesthisinventoryacrossalldatastoresinusebythevirtualmachinesinthecluster.Thenextscreenshotshowsanexampleofthisfileononeofthedatastores.
vSphere6.xHADeepdive
20FundamentalConcepts
Figure9-Protectedlistfile
Nowthatweknowthemasterlocksafileonthedatastoreandthatthisfilestoresinventorydetails,whathappenswhenthemasterisisolatedorfails?Ifthemasterfails,theanswerissimple:thelockwillexpireandthenewmasterwillrelockthefileifthedatastoreisaccessibletoit.
Inthecaseofisolation,thisscenarioisslightlydifferent,althoughtheresultissimilar.ThemasterwillreleasethelockithasonthefileonthedatastoretoensurethatwhenanewmasteriselecteditcandeterminethesetofvirtualmachinesthatareprotectedbyHAbyreadingthefile.If,byanychance,amastershouldfailrightatthemomentthatitbecameisolated,therestartofthevirtualmachineswillbedelayeduntilanewmasterhasbeenelected.Inascenariolikethis,accuracyandthefactthatvirtualmachinesarerestartedismoreimportantthanashortdelay.
Let’sassumeforasecondthatyourmasterhasjustfailed.Whatwillhappenandhowdotheslavesknowthatthemasterhasfailed?HAusesapoint-to-pointnetworkheartbeatmechanism.Iftheslaveshavereceivednonetworkheartbeatsfromthemaster,theslaveswilltrytoelectanewmaster.Thisnewmasterwillreadtherequiredinformationandwillinitiatetherestartofthevirtualmachineswithinroughly10seconds.
Restartingvirtualmachinesisnottheonlyresponsibilityofthemaster.ItisalsoresponsibleformonitoringthestateoftheslavehostsandreportingthisstatetovCenterServer.Ifaslavefailsorbecomesisolatedfromthemanagementnetwork,themasterwilldeterminewhichvirtualmachinesmustberestarted.Whenvirtualmachinesneedtoberestarted,themasterisalsoresponsiblefordeterminingtheplacementofthosevirtualmachines.Itusesaplacementenginethatwilltrytodistributethevirtualmachinestoberestartedevenlyacrossallavailablehosts.
vSphere6.xHADeepdive
21FundamentalConcepts
Alloftheseresponsibilitiesarereallyimportant,butwithoutamechanismtodetectaslavehasfailed,themasterwouldbeuseless.Justliketheslavesreceiveheartbeatsfromthemaster,themasterreceivesheartbeatsfromtheslavessoitknowstheyarealive.
SlavesAslavehassubstantiallyfewerresponsibilitiesthanamaster:aslavemonitorsthestateofthevirtualmachinesitisrunningandinformsthemasteraboutanychangestothisstate.
Theslavealsomonitorsthehealthofthemasterbymonitoringheartbeats.Ifthemasterbecomesunavailable,theslavesinitiateandparticipateintheelectionprocess.Lastbutnotleast,theslavessendheartbeatstothemastersothatthemastercandetectoutages.Likethemastertoslavecommunication,allslavetomastercommunicationispointtopoint.HAdoesnotusemulticast.
Figure10-SlaveAgent
FilesforbothSlaveandMasterBeforeexplainingthedetailsitisimportanttounderstandthatbothVirtualSANandVirtualVolumeshaveintroducedchangestothelocationandtheusageoffiles.Forspecificsonthesetwodifferentstoragearchitectureswereferyoutothoserespectivesectionsinthebook.
Boththemasterandslaveusefilesnotonlytostorestate,butalsoasacommunicationmechanism.We’vealreadyseentheprotectedlistfile(Figure8)usedbythemastertostorethelistofprotectedvirtualmachines.Wewillnowdiscussthefilesthatarecreatedbyboth
vSphere6.xHADeepdive
22FundamentalConcepts
themasterandtheslaves.Remotefilesarefilesstoredonashareddatastoreandlocalfilesarefilesthatarestoredinalocationonlydirectlyaccessibletothathost.
RemoteFiles
Thesetofpoweredonvirtualmachinesisstoredinaper-host“poweron”file.Itshouldbenotedthat,becauseamasteralsohostsvirtualmachines,italsocreatesa“poweron”file.
Thenamingschemeforthisfileisasfollows:host-number-poweron
Trackingvirtualmachinepower-onstateisnottheonlythingthe“poweron”fileisusedfor.Thisfileisalsousedbytheslavestoinformthemasterthatitisisolatedfromthemanagementnetwork:thetoplineofthefilewilleithercontaina0ora1.A0(zero)meansnot-isolatedanda1(one)meansisolated.ThemasterwillinformvCenterabouttheisolationofthehost.
LocalFiles
Asmentionedbefore,whenHAisconfiguredonahost,thehostwillstorespecificinformationaboutitsclusterlocally.
Figure11-Locallystoredfiles
Eachhost,includingthemaster,willstoredatalocally.Thedatathatislocallystoredisimportantstateinformation.Namely,theVM-to-hostcompatibilitymatrix,clusterconfiguration,andhostmembershiplist.Thisinformationispersistedlocallyoneachhost.UpdatestothisinformationissenttothemasterbyvCenterandpropagatedbythemastertotheslaves.Althoughweexpectthatmostofyouwillnevertouchthesefiles–andwehighlyrecommendagainstmodifyingthem–wedowanttoexplainhowtheyareused:
vSphere6.xHADeepdive
23FundamentalConcepts
clusterconfigThisfileisnothuman-readable.Itcontainstheconfigurationdetailsofthecluster.vmmetadataThisfileisnothuman-readable.ItcontainstheactualcompatibilityinfomatrixforeveryHAprotectedvirtualmachineandlistsallthehostswithwhichitiscompatibleplusavm/hostdictionaryfdm.cfgThisfilecontainstheconfigurationsettingsaroundlogging.Forinstance,thelevelofloggingandsyslogdetailsarestoredinhere.hostlistAlistofhostsparticipatinginthecluster,includinghostname,IPaddresses,MACaddressesandheartbeatdatastores.
HeartbeatingWementioneditacoupleoftimesalreadyinthischapter,anditisanimportantmechanismthatdeservesitsownsection:heartbeating.HeartbeatingisthemechanismusedbyHAtovalidatewhetherahostisalive.HAhastwodifferentheartbeatingmechanisms.Theseheartbeatmechanismsallowsittodeterminewhathashappenedtoahostwhenitisnolongerresponding.Let’sdiscusstraditionalnetworkheartbeatingfirst.
NetworkHeartbeating
NetworkHeartbeatingisusedbyHAtodetermineifanESXihostisalive.Eachslavewillsendaheartbeattoitsmasterandthemastersendsaheartbeattoeachoftheslaves,thisisapoint-to-pointcommunication.Theseheartbeatsaresentbydefaulteverysecond.
Whenaslaveisn’treceivinganyheartbeatsfromthemaster,itwilltrytodeterminewhetheritisIsolated–wewilldiscuss“states”inmoredetaillateroninthischapter.
Basicdesignprinciple:Networkheartbeatingiskeyfordeterminingthestateofahost.Ensurethemanagementnetworkishighlyresilienttoenableproperstatedetermination.
DatastoreHeartbeating
DatastoreheartbeatingaddsanextralevelofresiliencyandpreventsunnecessaryrestartattemptsfromoccurringasitallowsvSphereHAtodeterminewhetherahostisisolatedfromthenetworkoriscompletelyunavailable.Howdoesthiswork?
Datastoreheartbeatingenablesamastertomoredeterminethestateofahostthatisnotreachableviathemanagementnetwork.Thenewdatastoreheartbeatmechanismisusedincasethemasterhaslostnetworkconnectivitywiththeslaves.Thedatastoreheartbeatmechanismisthenusedtovalidatewhetherahosthasfailedorismerelyisolated/network
vSphere6.xHADeepdive
24FundamentalConcepts
partitioned.Isolationwillbevalidatedthroughthe“poweron”filewhich,asmentionedearlier,willbeupdatedbythehostwhenitisisolated.Withoutthe“poweron”file,thereisnowayforthemastertovalidateisolation.Letthatbeclear!Basedontheresultsofchecksofbothfiles,themasterwilldeterminetheappropriateactiontotake.Ifthemasterdeterminesthatahosthasfailed(nodatastoreheartbeats),themasterwillrestartthefailedhost’svirtualmachines.IfthemasterdeterminesthattheslaveisIsolatedorPartitioned,itwillonlytakeactionwhenitisappropriatetotakeaction.Withthatmeaningthatthemasterwillonlyinitiaterestartswhenvirtualmachinesaredownorpowereddown/shutdownbyatriggeredisolationresponse,butwewilldiscussthisinmoredetailinChapter4.
Bydefault,HAselects2heartbeatdatastores–itwillselectdatastoresthatareavailableonallhosts,orasmanyaspossible.Althoughitispossibletoconfigureanadvancedsetting(das.heartbeatDsPerHost)toallowformoredatastoresfordatastoreheartbeatingwedonotrecommendconfiguringthisoptionasthedefaultshouldbesufficientformostscenarios,exceptforstretchedclusterenvironmentswhereitisrecommendedtohavetwoineachsitemanuallyselected.
TheselectionprocessgivespreferencetoVMFSoverNFSdatastores,andseekstochoosedatastoresthatarebackedbydifferentLUNsorNFSserverswhenpossible.Ifdesired,youcanalsoselecttheheartbeatdatastoresyourself.We,however,recommendlettingvCenterdealwiththisoperational“burden”asvCenterusesaselectionalgorithmtoselectheartbeatdatastoresthatarepresentedtoallhosts.ThishoweverisnotaguaranteethatvCentercanselectdatastoreswhichareconnectedtoallhosts.ItshouldbenotedthatvCenterisnotsite-aware.Inscenarioswherehostsaregeographicallydisperseditisrecommendtomanuallyselectheartbeatdatastorestoensureeachsitehasonesite-localheartbeatdatastoreatminimum.
Basicdesignprinciple:Inametro-cluster/geographicallydispersedclusterwerecommendsettingtheminimumnumberofheartbeatdatastorestofour.Itisrecommendedtomanuallyselectsitelocaldatastores,twoforeachsite.
vSphere6.xHADeepdive
25FundamentalConcepts
Figure12-Selectingtheheartbeatdatastores
Thequestionnowarises:what,exactly,isthisdatastoreheartbeatingandwhichdatastoreisusedforthisheartbeating?Let’sanswerwhichdatastoreisusedfordatastoreheartbeatingfirstaswecansimplyshowthatwithascreenshot,seebelow.vSpheredisplaysextensivedetailsaroundthe“ClusterStatus”ontheCluster’sMonitortab.Thisforinstanceshowsyouwhichdatastoresarebeingusedforheartbeatingandwhichhostsareusingwhichspecificdatastore(s).Inaddition,itdisplayshowmanyvirtualmachinesareprotectedandhowmanyhostsareconnectedtothemaster.
InblockbasedstorageenvironmentsHAleveragesanexistingVMFSfilesystemmechanism.Thedatastoreheartbeatmechanismusesasocalled“heartbeatregion”whichisupdatedaslongasthefileisopen.OnVMFSdatastores,HAwillsimplycheckwhethertheheartbeatregionhasbeenupdated.Inordertoupdateadatastoreheartbeatregion,a
vSphere6.xHADeepdive
26FundamentalConcepts
hostneedstohaveatleastoneopenfileonthevolume.HAensuresthereisatleastonefileopenonthisvolumebycreatingafilespecificallyfordatastoreheartbeating.Inotherwords,aper-hostfileiscreatedonthedesignatedheartbeatingdatastores,asshownbelow.Thenamingschemeforthisfileisasfollows:host-number-hb.
OnNFSdatastores,eachhostwillwritetoitsheartbeatfileonceevery5seconds,ensuringthatthemasterwillbeabletocheckhoststate.Themasterwillsimplyvalidatethisbycheckingthatthetime-stampofthefilechanged.
Realizethatinthecaseofaconvergednetworkenvironment,theeffectivenessofdatastoreheartbeatingwillvarydependingonthetypeoffailure.Forinstance,aNICfailurecouldimpactbothnetworkanddatastoreheartbeating.If,forwhateverreason,thedatastoreorNFSsharebecomesunavailableorisremovedfromthecluster,HAwilldetectthisandselectanewdatastoreorNFSsharetousefortheheartbeatingmechanism.
Basicdesignprinciple
Datastoreheartbeatingaddsanewlevelofresiliencybutisnotthebe-allend-all.Inconvergednetworkingenvironments,theuseofdatastoreheartbeatingaddslittlevalueduetothefactthataNICfailuremayresultinboththenetworkandstoragebecomingunavailable.
IsolatedversusPartitionedWe’vealreadybrieflytouchedonitanditistimetohaveacloserlook.Whenitcomestonetworkfailurestherearetwodifferentstatesthatexist.WhataretheseexactlyandwhenisahostPartitionedratherthanIsolated?Beforewewillexplainthiswewanttopointoutthatthereisthestateasreportedbythemasterandthestateasobservedbyanadministratorandthecharacteristicsthesehave.
vSphere6.xHADeepdive
27FundamentalConcepts
First,considertheadministrator’sperspective.Twohostsareconsideredpartitionediftheyareoperationalbutcannotreacheachotheroverthemanagementnetwork.Further,ahostisisolatedifitdoesnotobserveanyHAmanagementtrafficonthemanagementnetworkanditcan’tpingtheconfiguredisolationaddresses.Itispossibleformultiplehoststobeisolatedatthesametime.Wecallasetofhoststhatarepartitionedbutcancommunicatewitheachothera“managementnetworkpartition”.Networkpartitionsinvolvingmorethantwopartitionsarepossiblebutnotlikely.
Now,considertheHAperspective.WhenanyHAagentisnotinnetworkcontactwithamaster,theywillelectanewmaster.So,whenanetworkpartitionexists,amasterelectionwilloccursothatahostfailureornetworkisolationwithinthispartitionwillresultinappropriateactionontheimpactedvirtualmachine(s).ThescreenshotbelowshowspossiblewaysinwhichanIsolationoraPartitioncanoccur.
Figure13-IsolatedversusPartitioned
vSphere6.xHADeepdive
28FundamentalConcepts
Ifaclusterispartitionedinmultiplesegments,eachpartitionwillelectitsownmaster,meaningthatifyouhave4partitionsyourclusterwillhave4masters.Whenthenetworkpartitioniscorrected,anyofthefourmasterswilltakeovertheroleandberesponsiblefortheclusteragain.Itshouldbenotedthatamastercouldclaimresponsibilityforavirtualmachinethatlivesinadifferentpartition.Ifthisoccursandthevirtualmachinehappenstofail,themasterwillbenotifiedthroughthedatastorecommunicationmechanism.
IntheHAarchitecture,whetherahostispartitionedisdeterminedbythemasterreportingthecondition.So,intheaboveexample,themasteronhostESXi-01willreportESXi-03and04partitionedwhilethemasteronhost04willreport01and02partitioned.Whenapartitionoccurs,vCenterreportstheperspectiveofonemaster.
Amasterreportsahostaspartitionedorisolatedwhenitcan’tcommunicatewiththehostoverthemanagementnetwork,itcanobservethehost’sdatastoreheartbeatsviatheheartbeatdatastores.Themastercannotalonedifferentiatebetweenthesetwostates–ahostisreportedasisolatedonlyifthehostinformsthemasterviathedatastoresthatisisolated.
ThisstillleavesthequestionopenhowthemasterdifferentiatesbetweenaFailed,Partitioned,orIsolatedhost.
Whenthemasterstopsreceivingnetworkheartbeatsfromaslave,itwillcheckforhost“liveness”forthenext15seconds.Beforethehostisdeclaredfailed,themasterwillvalidateifithasactuallyfailedornotbydoingadditionallivenesschecks.First,themasterwillvalidateifthehostisstillheartbeatingtothedatastore.Second,themasterwillpingthemanagementIPaddressofthehost.Ifbotharenegative,thehostwillbedeclaredFailed.Thisdoesn’tnecessarilymeanthehosthasPSOD’ed;itcouldbethenetworkisunavailable,includingthestoragenetwork,whichwouldmakethishostIsolatedfromanadministrator’sperspectivebutFailedfromanHAperspective.Asyoucanimagine,however,thereareavariouscombinationspossible.Thefollowingtabledepictsthesecombinationsincludingthe“state”.
State NetworkHeartbeat StorageHeartbeatHostLive-
nessPing
IsolationCriteriaMet
Running Yes N/A N/A N/A
Isolated No Yes No Yes
Partitioned No Yes No No
Failed No No No N/A
FDMAgentDown N/A N/A Yes N/A
vSphere6.xHADeepdive
29FundamentalConcepts
HAwilltriggeranactionbasedonthestateofthehost.WhenthehostismarkedasFailed,arestartofthevirtualmachineswillbeinitiated.WhenthehostismarkedasIsolated,themastermightinitiatetherestarts.
Theonethingtokeepinmindwhenitcomestoisolationresponseisthatavirtualmachinewillonlybeshutdownorpoweredoffwhentheisolatedhostknowsthereisamasterouttherethathastakenownershipforthevirtualmachineorwhentheisolatedhostlosesaccesstothehomedatastoreofthevirtualmachine.
Forexample,ifahostisisolatedandrunstwovirtualmachines,storedonseparatedatastores,thehostwillvalidateifitcanaccesseachofthehomedatastoresofthosevirtualmachines.Ifitcan,thehostwillvalidatewhetheramasterownsthesedatastores.Ifnomasterownsthedatastores,theisolationresponsewillnotbetriggeredandrestartswillnotbeinitiated.Ifthehostdoesnothaveaccesstothedatastore,forinstance,duringan“AllPathsDown”condition,HAwilltriggertheisolationresponsetoensurethe“original”virtualmachineispowereddownandwillbesafelyrestarted.Thistoavoidso-called“split-brain”scenarios.
Toreiterate,asthisisaveryimportantaspectofHAandhowithandlesnetworkisolations,theremaininghostsintheclusterwillonlyberequestedtorestartvirtualmachineswhenthemasterhasdetectedthateitherthehosthasfailedorhasbecomeisolatedandtheisolationresponsewastriggered.
VirtualMachineProtectionVirtualmachineprotectionhappensonseverallayersbutisultimatelytheresponsibilityofvCenter.WehaveexplainedthisbrieflybutwanttoexpandonitabitmoretomakesureeveryoneunderstandsthedependencyonvCenterwhenitcomestoprotectingvirtualmachines.Wedowanttostressthatthisonlyappliestoprotectingvirtualmachines;virtualmachinerestartsinnowayrequirevCentertobeavailableatthetime.
Whenthestateofavirtualmachinechanges,vCenterwilldirectthemastertoenableordisableHAprotectionforthatvirtualmachine.Protection,however,isonlyguaranteedwhenthemasterhascommittedthechangeofstatetodisk.Thereasonforthis,ofcourse,isthatafailureofthemasterwouldresultinthelossofanystatechangesthatexistonlyinmemory.Aspointedoutearlier,thisstateisdistributedacrossthedatastoresandstoredinthe“protectedlist”file.
Whenthepowerstatechangeofavirtualmachinehasbeencommittedtodisk,themasterwillinformvCenterServersothatthechangeinstatusisvisiblebothfortheuserinvCenterandforotherprocesseslikemonitoringtools.
vSphere6.xHADeepdive
30FundamentalConcepts
Toclarifytheprocess,wehavecreatedaworkflowdiagramoftheprotectionofavirtualmachinefromthepointitispoweredonthroughvCenter:
Figure14-VirtualMachineprotectionworkflow
Butwhatabout“unprotection?”Whenavirtualmachineispoweredoff,itmustberemovedfromtheprotectedlist.WehavedocumentedthisworkflowinthefollowingdiagramforthesituationwherethepoweroffisinvokedfromvCenter.
vSphere6.xHADeepdive
31FundamentalConcepts
Figure15-VirtualMachineUnprotectionworkflow
vSphere6.xHADeepdive
32FundamentalConcepts
RestartingVirtualMachinesInthepreviouschapter,wehavedescribedmostofthelowerlevelfundamentalconceptsofHA.WehaveshownyouthatmultiplemechanismsincreaseresiliencyandreliabilityofHA.ReliabilityofHAinthiscasemostlyreferstorestarting(orresetting)virtualmachines,asthatremainsHA’sprimarytask.
HAwillrespondwhenthestateofahosthaschanged,or,bettersaid,whenthestateofoneormorevirtualmachineshaschanged.TherearemultiplescenariosinwhichHAwillrespondtoavirtualmachinefailure,themostcommonofwhicharelistedbelow:
FailedhostIsolatedhostFailedguestoperatingsystem
Dependingonthetypeoffailure,butalsodependingontheroleofthehost,theprocesswilldifferslightly.Changingtheprocessresultsinslightlydifferentrecoverytimelines.Therearemanydifferentscenariosandthereisnopointincoveringallofthem,sowewilltrytodescribethemostcommonscenarioandincludetimelineswherepossible.
Beforewediveintothedifferentfailurescenarios,wewanttoexplainhowrestartpriorityandretrieswork.
RestartPriorityandOrderHAcantaketheconfiguredpriorityofthevirtualmachineintoaccountwhenrestartingVMs.However,itisgoodtoknowthatAgentVMstakeprecedenceduringtherestartprocedureasthe“regular”virtualmachinesmayrelyonthem.Agoodexampleofanagentvirtualmachineisavirtualstorageappliance.
Prioritizationisdonebyeachhostandnotglobally.Eachhostthathasbeenrequestedtoinitiaterestartattemptswillattempttorestartalltoppriorityvirtualmachinesbeforeattemptingtostartanyothervirtualmachines.Iftherestartofatoppriorityvirtualmachinefails,itwillberetriedafteradelay.Inthemeantime,however,HAwillcontinuepoweringontheremainingvirtualmachines.Keepinmindthatsomevirtualmachinesmightbedependentontheagentvirtualmachines.Youshoulddocumentwhichvirtualmachinesaredependentonwhichagentvirtualmachinesanddocumenttheprocesstostartuptheseservicesintherightorderinthecasetheautomaticrestartofanagentvirtualmachinefails.
vSphere6.xHADeepdive
33RestartingVirtualMachines
Basicdesignprinciple:Virtualmachinescanbedependentontheavailabilityofagentvirtualmachinesorothervirtualmachines.AlthoughHAwilldoitsbesttoensureallvirtualmachinesarestartedinthecorrectorder,thisisnotguaranteed.Documenttheproperrecoveryprocess.
Besidesagentvirtualmachines,HAalsoprioritizesFTsecondarymachines.Wehavelistedthefullorderinwhichvirtualmachineswillberestartedbelow:
AgentvirtualmachinesFTsecondaryvirtualmachinesVirtualMachinesconfiguredwitharestartpriorityofhighVirtualMachinesconfiguredwithamediumrestartpriorityVirtualMachinesconfiguredwithalowrestartpriority
ItshouldbenotedthatHAwillnotplaceanyvirtualmachinesonahostiftherequirednumberofagentvirtualmachinesarenotrunningonthehostatthetimeplacementisdone.
Nowthatwehavebrieflytouchedonit,wewouldalsoliketoaddress“restartretries”andparallelizationofrestartsasthatmoreorlessdictateshowlongitcouldtakebeforeallvirtualmachinesofafailedorisolatedhostarerestarted.
RestartRetriesThenumberofretriesisconfigurableasofvCenter2.5U4withtheadvancedoption“das.maxvmrestartcount”.Thedefaultvalueis5.Notethattheinitialrestartisincluded.
HAwilltrytostartthevirtualmachineononeofyourhostsintheaffectedcluster;ifthisisunsuccessfulonthathost,therestartcountwillbeincreasedby1.Beforewegointotheexacttimeline,letitbeclearthatT0isthepointatwhichthemasterinitiatesthefirstrestartattempt.Thisbyitselfcouldbe30secondsafterthevirtualmachinehasfailed.Theelapsedtimebetweenthefailureofthevirtualmachineandtherestart,though,willdependonthescenarioofthefailure,whichwewilldiscussinthischapter.
Assaid,thedefaultnumberofrestartsis5.Therearespecifictimesassociatedwitheachoftheseattempts.Thefollowingbulletlistwillclarifythisconcept.The‘m’standsfor“minutes”inthislist.
T0–InitialRestartT2m–Restartretry1T6m–Restartretry2T14m–Restartretry3T30m–Restartretry4
vSphere6.xHADeepdive
34RestartingVirtualMachines
Figure16-HighAvailabilityrestarttimeline
vSphere6.xHADeepdive
35RestartingVirtualMachines
Asclearlydepictedinthediagramabove,asuccessfulpower-onattemptcouldtakeupto~30minutesinthecasewheremultiplepower-onattemptsareunsuccessful.Thisis,however,notexactscience.Forinstance,thereisa2-minutewaitingperiodbetweentheinitialrestartandthefirstrestartretry.HAwillstartthe2-minutewaitassoonasithasdetectedthattheinitialattempthasfailed.So,inreality,T2couldbeT2plus8seconds.Anotherimportantfactthatwewantemphasizeisthatthereisnocoordinationbetweenmasters,andsoifmultipleonesareinvolvedintryingtorestartthevirtualmachine,eachwillretaintheirownsequence.Multiplemasterscouldattempttorestartavirtualmachine.Althoughonlyonewillsucceed,itmightchangesomeofthetimelines.
WhataboutVMswhichare"disabled"forHA?WhatwillhappenwiththoseVMs?BeforevSphere6.0thoseVMswouldbeleftalone,asofvSphere6.0theseVMswillberegisteredonanotherhostafterafailure.Thiswillallowyoutoeasilypower-onthatVMwhenneededwithoutneededtomanuallyre-registerityourself.Note,HAwillnotdoapower-onoftheVM,itwilljustregisteritforyou!
Let’sgiveanexampletoclarifythescenarioinwhichamasterfailsduringarestartsequence:
Cluster:4Host(esxi01,esxi02,esxi03,esxi04)
Master:esxi01
Thehost“esxi02”isrunningasinglevirtualmachinecalled“vm01”anditfails.Themaster,esxi01,willtrytorestartitbuttheattemptfails.Itwilltryrestarting“vm01”upto5timesbut,
unfortunately,onthe4thtry,themasteralsofails.Anelectionoccursand“esxi03”becomesthenewmaster.Itwillnowinitiatetherestartof“vm01”,andifthatrestartwouldfailitwillretryitupto4timesagainforatotalincludingtheinitialrestartof5.
Beaware,though,thatasuccessfulrestartmightneveroccuriftherestartcountisreachedandallfiverestartattempts(thedefaultvalue)wereunsuccessful.
Whenitcomestorestarts,onethingthatisveryimportanttorealizeisthatHAwillnotissuemorethan32concurrentpower-ontasksonagivenhost.Tomakethatmoreclear,let’susetheexampleofatwohostcluster:ifahostfailswhichcontained33virtualmachinesandall
ofthesehadthesamerestartpriority,32poweronattemptswouldbeinitiated.The33rd
poweronattemptwillonlybeinitiatedwhenoneofthose32attemptshascompletedregardlessofsuccessorfailureofoneofthoseattempts.
Now,herecomesthegotcha.Ifthereare32low-priorityvirtualmachinestobepoweredonandasinglehigh-priorityvirtualmachine,thepoweronattemptforthelow-priorityvirtualmachineswillnotbeissueduntilthepoweronattemptforthehighpriorityvirtualmachinehascompleted.LetitbeabsolutelyclearthatHAdoesnotwaittorestartthelow-priorityvirtualmachinesuntilthehigh-priorityvirtualmachinesarestarted,itwaitsfortheissued
vSphere6.xHADeepdive
36RestartingVirtualMachines
poweronattempttobereportedas“completed”.Intheory,thismeansthatifthepoweronattemptfails,thelow-priorityvirtualmachinescouldbepoweredonbeforethehighpriorityvirtualmachine.
Therestartpriorityhoweverdoesguaranteethatwhenaplacementisdone,thehigherpriorityvirtualmachinesgetfirstrighttoanyavailableresources.
Basicdesignprinciple:Configuringrestartpriorityofavirtualmachineisnotaguaranteethatvirtualmachineswillactuallyberestartedinthisorder.Ensureproperoperationalproceduresareinplaceforrestartingservicesorvirtualmachinesintheappropriateorderintheeventofafailure.
Nowthatweknowhowvirtualmachinerestartpriorityandrestartretriesarehandled,itistimetolookatthedifferentscenarios.
FailedhostFailureofamasterFailureofaslave
Isolatedhostandresponse
FailedHostWhendiscussingafailedhostscenarioitisneededtomakeadistinctionbetweenthefailureofamasterversusthefailureofaslave.Wewanttoemphasizethisbecausethetimeittakesbeforearestartattemptisinitiateddiffersbetweenthesetwoscenarios.Althoughthemajorityofyouprobablywon’tnoticethetimedifference,itisimportanttocallout.Let’sstartwiththemostcommonfailure,thatofahostfailing,butnotethatfailuresgenerallyoccurinfrequently.Inmostenvironments,hardwarefailuresareveryuncommontobeginwith.Justincaseithappens,itdoesn’thurttounderstandtheprocessanditsassociatedtimelines.
TheFailureofaSlave
Thefailureofaslavehostisafairlycomplexscenario.Partofthiscomplexitycomesfromtheintroductionofanewheartbeatmechanism.Actually,therearetwodifferentscenarios:onewhereheartbeatdatastoresareconfiguredandonewhereheartbeatdatastoresarenotconfigured.Keepinginmindthatthisisanactualfailureofthehost,thetimelineisasfollows:
T0–Slavefailure.T3s–Masterbeginsmonitoringdatastoreheartbeatsfor15seconds.T10s–Thehostisdeclaredunreachableandthemasterwillpingthemanagementnetworkofthefailedhost.Thisisacontinuouspingfor5seconds.
vSphere6.xHADeepdive
37RestartingVirtualMachines
T15s–Ifnoheartbeatdatastoresareconfigured,thehostwillbedeclareddead.T18s–Ifheartbeatdatastoresareconfigured,thehostwillbedeclareddead.
Themastermonitorsthenetworkheartbeatsofaslave.Whentheslavefails,theseheartbeatswillnolongerbereceivedbythemaster.WehavedefinedthisasT0.After3seconds(T3s),themasterwillstartmonitoringfordatastoreheartbeatsanditwilldothisfor
15seconds.Onthe10thsecond(T10s),whennonetworkordatastoreheartbeatshavebeendetected,thehostwillbedeclaredas“unreachable”.Themasterwillalsostartpinging
themanagementnetworkofthefailedhostatthe10thsecondanditwilldosofor5seconds.Ifnoheartbeatdatastoreswereconfigured,thehostwillbedeclared“dead”atthe
15thsecond(T15s)andvirtualmachinerestartswillbeinitiatedbythemaster.Ifheartbeat
datastoreshavebeenconfigured,thehostwillbedeclareddeadatthe18thsecond(T18s)andrestartswillbeinitiated.Werealizethatthiscanbeconfusingandhopethetimelinedepictedinthediagrambelowmakesiteasiertodigest.
vSphere6.xHADeepdive
38RestartingVirtualMachines
Figure17-Restarttimelineslavefailure
Themasterfiltersthevirtualmachinesitthinksfailedbeforeinitiatingrestarts.Themasterusestheprotectedlistforthis,on-diskstatecouldbeobtainedonlybyonemasteratatimesinceitrequiredopeningtheprotectedlistfileinexclusivemode.IfthereisanetworkpartitionmultiplemasterscouldtrytorestartthesamevirtualmachineasvCenterServeralsoprovidedthenecessarydetailsforarestart.Asanexample,itcouldhappenthatamasterhaslockedavirtualmachine’shomedatastoreandhasaccesstotheprotectedlist
vSphere6.xHADeepdive
39RestartingVirtualMachines
whiletheothermasterisincontactwithvCenterServerandassuchisawareofthecurrentdesiredprotectedstate.InthisscenarioitcouldhappenthatthemasterwhichdoesnotownthehomedatastoreofthevirtualmachinewillrestartthevirtualmachinebasedontheinformationprovidedbyvCenterServer.
Thischangeinbehaviorwasintroducedtoavoidthescenariowherearestartofavirtualmachinewouldfailduetoinsufficientresourcesinthepartitionwhichwasresponsibleforthevirtualmachine.Withthischange,thereislesschanceofsuchasituationoccurringasthemasterintheotherpartitionwouldbeusingtheinformationprovidedbyvCenterServertoinitiatetherestart.
Thatleavesuswiththequestionofwhathappensinthecaseofthefailureofamaster.
TheFailureofaMaster
Inthecaseofamasterfailure,theprocessandtheassociatedtimelineareslightlydifferent.Thereasonbeingthatthereneedstobeamasterbeforeanyrestartcanbeinitiated.Thismeansthatanelectionwillneedtotakeplaceamongsttheslaves.Thetimelineisasfollows:
T0–Masterfailure.T10s–Masterelectionprocessinitiated.T25s–Newmasterelectedandreadstheprotectedlist.T35s–Newmasterinitiatesrestartsforallvirtualmachinesontheprotectedlistwhicharenotrunning.
Slavesreceivenetworkheartbeatsfromtheirmaster.Ifthemasterfails,let’sdefinethisasT0(Tzero),theslavesdetectthiswhenthenetworkheartbeatsceasetobereceived.Aseveryclusterneedsamaster,theslaveswillinitiateanelectionatT10s.Theelectionprocesstakes15stocomplete,whichbringsustoT25s.AtT25s,thenewmasterreadstheprotectedlist.Thislistcontainsallthevirtualmachines,whichareprotectedbyHA.AtT35s,themasterinitiatestherestartofallvirtualmachinesthatareprotectedbutnotcurrentlyrunning.Thetimelinedepictedinthediagrambelowhopefullyclarifiestheprocess.
vSphere6.xHADeepdive
40RestartingVirtualMachines
Figure18-Restarttimelinemasterfailure
Besidesthefailureofahost,thereisanotherreasonforrestartingvirtualmachines:anisolationevent.
IsolationResponseandDetectionBeforewewilldiscussthetimelineandtheprocessaroundtherestartofvirtualmachinesafteranisolationevent,wewilldiscussIsolationResponseandIsolationDetection.OneofthefirstdecisionsthatwillneedtobemadewhenconfiguringHAisthe“IsolationResponse”.
IsolationResponse
vSphere6.xHADeepdive
41RestartingVirtualMachines
TheIsolationResponsereferstotheactionthatHAtakesforitsvirtualmachineswhenthehosthaslostitsconnectionwiththenetworkandtheremainingnodesinthecluster.Thisdoesnotnecessarilymeanthatthewholenetworkisdown;itcouldjustbethemanagementnetworkportsofthisspecifichost.Todaytherearethreeisolationresponses:“Poweroff”,“Leavepoweredon”and“Shutdown”.Thisisolationresponseanswersthequestion,“whatshouldahostdowiththevirtualmachinesitmanageswhenitdetectsthatitisisolatedfromthenetwork?”Let’sdiscussthesethreeoptionsmorein-depth:
Poweroff–Whenisolationoccurs,allvirtualmachinesarepoweredoff.Itisahardstop,ortoputitbluntly,the“virtual”powercableofthevirtualmachinewillbepulledout!Shutdown–Whenisolationoccurs,allvirtualmachinesrunningonthehostwillbeshutdownusingaguest-initiatedshutdownthroughVMwareTools.Ifthisisnotsuccessfulwithin5minutes,a“poweroff”willbeexecuted.Thistimeoutvaluecanbeadjustedbysettingtheadvancedoptiondas.isolationShutdownTimeout.IfVMwareToolsisnotinstalled,a“poweroff”willbeinitiatedimmediately.Leavepoweredon–Whenisolationoccursonthehost,thestateofthevirtualmachinesremainsunchanged.
Thissettingcanbechangedontheclustersettingsundervirtualmachineoptions.
Figure19-Clusterdefaultsettings
Thedefaultsettingfortheisolationresponsehaschangedmultipletimesoverthelastcoupleofyearsandthishascausedsomeconfusion.
UptoESXi3.5U2/vCenter2.5U2thedefaultisolationresponsewas“Poweroff”WithESXi3.5U3/vCenter2.5U3thiswaschangedto“Leavepoweredon”WithvSphere4.0itwaschangedto“Shutdown”.WithvSphere5.0ithasbeenchangedto“Leavepoweredon”.
Keepinmindthatthesechangesareonlyapplicabletonewlycreatedclusters.Whencreatinganewcluster,itmayberequiredtochangethedefaultisolationresponsebasedontheconfigurationofexistingclustersand/oryourcustomer’srequirements,constraintsandexpectations.Whenupgradinganexistingcluster,itmightbewisetoapplythelatestdefaultvalues.Youmightwonderwhythedefaulthaschangedonceagain.Therewasalotoffeedbackfromcustomersthat“Leavepoweredon”wasthedesireddefaultvalue.
vSphere6.xHADeepdive
42RestartingVirtualMachines
Basicdesignprinciple:Beforeupgradinganenvironmenttolaterversions,ensureyouvalidatethebestpracticesanddefaultsettings.Documentthem,includingjustification,toensureallpeopleinvolvedunderstandyourreasons.
Thequestionremains,whichsettingshouldbeused?Theobviousanswerapplieshere;itdepends.Weprefer“Leavepoweredon”becauseiteliminatesthechancesofhavingafalsepositiveanditsassociateddowntime.OneoftheproblemsthatpeoplehaveexperiencedinthepastisthatHAtriggereditsisolationresponsewhenthefullmanagementnetworkwentdown.Basicallyresultinginthepoweroff(orshutdown)ofeverysinglevirtualmachineandnonebeingrestarted.Thisproblemhasbeenmitigated.HAwillvalidateifvirtualmachinesrestartscanbeattempted–thereisnoreasontoincuranydowntimeunlessabsolutelynecessary.Itdoesthisbyvalidatingthatamasterownsthedatastorethevirtualmachineisstoredon.Ofcourse,theisolatedhostcanonlyvalidatethisifithasaccesstothedatastores.InaconvergednetworkenvironmentwithiSCSIstorage,forinstance,itwouldbeimpossibletovalidatethisduringafullisolationasthevalidationwouldfailduetotheinaccessibledatastorefromtheperspectiveoftheisolatedhost.
Wefeelthatchangingtheisolationresponseismostusefulinenvironmentswhereafailureofthemanagementnetworkislikelycorrelatedwithafailureofthevirtualmachinenetwork(s).Ifthefailureofthemanagementnetworkwon’tlikelycorrespondwiththefailureofthevirtualmachinenetworks,isolationresponsewouldcauseunnecessarydowntimeasthevirtualmachinescancontinuetorunwithoutmanagementnetworkconnectivitytothehost.
Aseconduseforpoweroff/shutdownisinscenarioswherethevirtualmachineretainsaccesstothevirtualmachinenetworkbutlosesaccesstoitsstorage,leavingthevirtualmachinepowered-oncouldresultintwovirtualmachinesonthenetworkwiththesameIPaddress.
Itisstilldifficulttodecidewhichisolationresponseshouldbeused.Thefollowingtablewascreatedtoprovidesomemoreguidelines.
vSphere6.xHADeepdive
43RestartingVirtualMachines
Likelihoodthathostwillretainaccessto
VMdatastore
LikelihoodVMswillretain
accesstoVM
network
RecommendedIsolationPolicy
Rationale
Likely Likely LeavePoweredOn
Virtualmachineisrunningfine,noreasontopoweritoff
Likely UnlikelyEitherLeavePoweredOnorShutdown.
ChooseshutdowntoallowHAtorestartvirtualmachinesonhoststhatarenotisolatedandhencearelikelytohaveaccesstostorage
Unlikely Likely PowerOff
UsePowerOfftoavoidhavingtwoinstancesofthesamevirtualmachineonthevirtualmachinenetwork
Unlikely UnlikelyLeavePoweredOnorPowerOff
LeavePoweredonifthevirtualmachinecanrecoverfromthenetwork/datastoreoutageifitisnotrestartedbecauseoftheisolation,andPowerOffifitlikelycan’t.
Thequestionthatwehaven’tansweredyetishowHAknowswhichvirtualmachineshavebeenpowered-offduetothetriggeredisolationresponseandwhytheisolationresponseismorereliablethanwithpreviousversionsofHA.Previously,HAdidnotcareandwouldalwaystrytorestartthevirtualmachinesaccordingtothelastknownstateofthehost.Thatisnolongerthecase.Beforetheisolationresponseistriggered,theisolatedhostwillverifywhetheramasterisresponsibleforthevirtualmachine.
Asmentionedearlier,itdoesthisbyvalidatingifamasterownsthehomedatastoreofthevirtualmachine.Whenisolationresponseistriggered,theisolatedhostremovesthevirtualmachineswhicharepoweredofforshutdownfromthe“poweron”file.Themasterwillrecognizethatthevirtualmachineshavedisappearedandinitiatearestart.Ontopofthat,whentheisolationresponseistriggered,itwillcreateaper-virtualmachinefileundera“poweredoff”directorywhichindicatesforthemasterthatthisvirtualmachinewaspowereddownasaresultofatriggeredisolationresponse.Thisinformationwillbereadbythemasternodewhenitinitiatestherestartattemptinordertoguaranteethatonlyvirtualmachinesthatwerepoweredoff/shutdownbyHAwillberestartedbyHA.
Thisis,however,onlyonepartoftheincreasedreliabilityofHA.Reliabilityhasalsobeenimprovedwithrespectto“isolationdetection,”whichwillbedescribedinthefollowingsection.
IsolationDetection
vSphere6.xHADeepdive
44RestartingVirtualMachines
Wehaveexplainedwhattheoptionsaretorespondtoanisolationeventandwhathappenswhentheselectedresponseistriggered.However,wehavenotextensivelydiscussedhowisolationisdetected.Themechanismisfairlystraightforwardandworkswithheartbeats,asearlierexplained.Thereare,however,twoscenariosagain,andtheprocessandassociatedtimelinesdifferforeachofthem:
IsolationofaslaveIsolationofamaster
Beforeweexplainthedifferencesinprocessbetweenbothscenarios,wewanttomakesureitisclearthatachangeinstatewillresultintheisolationresponsenotbeingtriggeredineitherscenario.Meaningthatifasinglepingissuccessfulorthehostobserveselectiontrafficandiselectedamasterorslave,theisolationresponsewillnotbetriggered,whichisexactlywhatyouwantasavoidingdowntimeisatleastasimportantasrecoveringfromdowntime.Whenahosthasdeclareditselfisolatedandobserveselectiontrafficitwilldeclareitselfnolongerisolated.
IsolationofaSlave
HAtriggersamasterelectionprocessbeforeitwilldeclareahostisisolated.Inthebelowtimeline,“s”referstoseconds.
T0–Isolationofthehost(slave)T10s–Slaveenters“electionstate”T25s–SlaveelectsitselfasmasterT25s–Slavepings“isolationaddresses”T30s–SlavedeclaresitselfisolatedT60s–Slave“triggers”isolationresponse
WhentheisolationresponseistriggeredHAcreatesa“power-off”fileforanyvirtualmachineHApowersoffwhosehomedatastoreisaccessible.Nextitpowersoffthevirtualmachine(orshutsdown)andupdatesthehost’spoweronfile.Thepower-offfileisusedtorecordthatHApoweredoffthevirtualmachineandsoHAshouldrestartit.Thesepower-offfilesaredeletedwhenavirtualmachineispoweredbackonorHAisdisabled.
Afterthecompletionofthissequence,themasterwilllearntheslavewasisolatedthroughthe“poweron”fileasmentionedearlier,andwillrestartvirtualmachinesbasedontheinformationprovidedbytheslave.
vSphere6.xHADeepdive
45RestartingVirtualMachines
Figure20-Isolationofaslavetimeline
IsolationofaMaster
Inthecaseoftheisolationofamaster,thistimelineisabitlesscomplicatedbecausethereisnoneedtogothroughanelectionprocess.Inthistimeline,“s”referstoseconds.
T0–Isolationofthehost(master)T0–Masterpings“isolationaddresses”T5s–MasterdeclaresitselfisolatedT35s–Master“triggers”isolationresponse
AdditionalChecks
Beforeahostdeclaresitselfisolated,itwillpingthedefaultisolationaddresswhichisthegatewayspecifiedforthemanagementnetwork,andwillcontinuetopingtheaddressuntilitbecomesunisolated.HAgivesyoutheoptiontodefineoneormultipleadditionalisolationaddressesusinganadvancedsetting.Thisadvancedsettingiscalleddas.isolationaddressandcouldbeusedtoreducethechancesofhavingafalsepositive.Werecommendsettinganadditionalisolationaddress.Ifasecondarymanagementnetworkisconfigured,thisadditionaladdressshouldbepartofthesamenetworkasthesecondarymanagementnetwork.Ifrequired,youcanconfigureupto10additionalisolationaddresses.Asecondarymanagementnetworkwillmorethanlikelybeonadifferentsubnetanditisrecommendedtospecifyanadditionalisolationaddresswhichispartofthesubnet.
vSphere6.xHADeepdive
46RestartingVirtualMachines
Figure21-IsolationAddress
SelectinganAdditionalIsolationAddressAquestionaskedbymanypeopleiswhichaddressshouldbespecifiedforthisadditionalisolationverification.Wegenerallyrecommendanisolationaddressclosetothehoststoavoidtoomanynetworkhopsandanaddressthatwouldcorrelatewiththelivenessofthevirtualmachinenetwork.Inmanycases,themostlogicalchoiceisthephysicalswitchtowhichthehostisdirectlyconnected.Basically,usethegatewayforwhateversubnetyourmanagementnetworkison.Anotherusualsuspectwouldbearouteroranyotherreliableandpingabledeviceonthesamesubnet.However,whenyouareusingIP-basedsharedstoragelikeNFSoriSCSI,theIP-addressofthestoragedevicecanalsobeagoodchoice.
Basicdesignprinciple:Selectareliablesecondaryisolationaddress.Trytominimizethenumberof“hops”betweenthehostandthisaddress.
IsolationPolicyDelayForthosewhowanttoincreasethetimeittakesbeforeHAexecutestheisolationresponseanadvancedsettingisavailable.Thussettingiscalled“das.config.fdm.isolationPolicyDelaySec”andallowschangingthenumberofsecondstowaitbeforetheisolationpolicyisexecutedis.Theminimumvalueis30.Ifsettoavaluelessthan30,thedelaywillbe30seconds.Wedonotrecommendchangingthisadvancedsettingunlessthereisaspecificrequirementtodoso.Inalmostallscenarios30secondsshouldsuffice.
RestartingVirtualMachinesThemostimportantprocedurehasnotyetbeenexplained:restartingvirtualmachines.Wehavededicatedafullsectiontothisconcept.
vSphere6.xHADeepdive
47RestartingVirtualMachines
Wehaveexplainedthedifferenceinbehaviorfromatimingperspectiveforrestartingvirtualmachinesinthecaseofabothmasternodeandslavenodefailures.Fornow,let’sassumethataslavenodehasfailed.WhenthemasternodedeclarestheslavenodeasPartitionedorIsolated,itdetermineswhichvirtualmachineswererunningonusingtheinformationitpreviouslyreadfromthehost’s“poweron”file.Thesefilesareasynchronouslyreadapproximatelyevery30s.IfthehostwasnotPartitionedorIsolatedbeforethefailure,themasterusescacheddatatodeterminethevirtualmachinesthatwerelastrunningonthehostbeforethefailureoccurred.
Beforeitwillinitiatetherestartattempts,though,themasterwillfirstvalidatethatthevirtualmachineshouldberestarted.ThisvalidationusestheprotectioninformationvCenterServerprovidestoeachmaster,orifthemasterisnotincontactwithvCenterServer,theinformationsavedintheprotectedlistfiles.IfthemasterisnotincontactwithvCenterServerorhasnotlockedthefile,thevirtualmachineisfilteredout.Atthispoint,allvirtualmachineshavingarestartpriorityof“disabled”arealsofilteredout.
NowthatHAknowswhichvirtualmachinesitshouldrestart,itistimetodecidewherethevirtualmachinesareplaced.HAwilltakemultiplethingsintoaccount:
CPUandmemoryreservation,includingthememoryoverheadofthevirtualmachineUnreservedcapacityofthehostsintheclusterRestartpriorityofthevirtualmachinerelativetotheothervirtualmachinesthatneedtoberestartedVirtual-machine-to-hostcompatibilitysetThenumberofdvPortsrequiredbyavirtualmachineandthenumberavailableonthecandidatehostsThemaximumnumberofvCPUsandvirtualmachinesthatcanberunonagivenhostRestartlatencyWhethertheactivehostsarerunningtherequirednumberofagentvirtualmachines.
Restartlatencyreferstotheamountoftimeittakestoinitiatevirtualmachinerestarts.Thismeansthatvirtualmachinerestartswillbedistributedbythemasteracrossmultiplehoststoavoidabootstorm,andthusadelay,onasinglehost.
Ifaplacementisfound,themasterwillsendeachtargethostthesetofvirtualmachinesitneedstorestart.Ifthislistexceeds32virtualmachines,HAwilllimitthenumberofconcurrentpoweronattemptsto32.Ifavirtualmachinesuccessfullypowerson,thenodeonwhichthevirtualmachinewaspoweredonwillinformthemasterofthechangeinpowerstate.Themasterwillthenremovethevirtualmachinefromtherestartlist.
Ifaplacementcannotbefound,themasterwillplacethevirtualmachineona“pendingplacementlist”andwillretryplacementofthevirtualmachinewhenoneofthefollowingconditionschanges:
vSphere6.xHADeepdive
48RestartingVirtualMachines
Anewvirtual-machine-to-hostcompatibilitylistisprovidedbyvCenter.Ahostreportsthatitsunreservedcapacityhasincreased.Ahost(re)joinsthecluster(Forinstance,whenahostistakenoutofmaintenancemode,ahostisaddedtoacluster,etc.)Anewfailureisdetectedandvirtualmachineshavetobefailedover.Afailureoccurredwhenfailingoveravirtualmachine.
ButwhataboutDRS?Wouldn’tDRSbeabletohelpduringtheplacementofvirtualmachineswhenallelsefails?Itdoes.ThemasternodewillreporttovCenterthesetofvirtualmachinesthatwerenotplacedduetoinsufficientresources,asisthecasetoday.IfDRSisenabled,thisinformationwillbeusedinanattempttohaveDRSmakecapacityavailable.
ComponentProtectionInvSphere6.0anewfeatureaspartofvSphereHAisintroducedcalledVMComponentProtection.VMComponentProtection(VMCP)invSphere6.0allowsyoutoprotectvirtualmachinesagainstthefailureofyourstoragesystem.TherearetwotypesoffailuresVMCPwillrespondtoandthosearePermanentDeviceLoss(PDL)andAllPathsDown(APD).Beforewelookatsomeofthedetails,wewanttopointoutthatenablingVMCPisextremelyeasy.Itcanbeenabledbyasingletickboxasshowninthescreenshotbelow.
Figure22-VirtualMachineComponentProtection
vSphere6.xHADeepdive
49RestartingVirtualMachines
AsstatedtherearetwoscenariosHAcanrespondto,PDLandAPD.Letslookatthosetwoscenariosabitcloser.WithvSphere5.0afeaturewasintroducedasanadvancedoptionthatwouldallowvSphereHAtorestartVMsimpactedbyaPDLcondition.
APDLcondition,isaconditionthatiscommunicatedbythearraycontrollertoESXiviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditionwouldbecommunicatedbythearraywouldbewhenaLUNissetoffline.ThisconditionisusedduringafailurescenariotoensureESXitakesappropriateactionwhenaccesstoaLUNisrevoked.ItshouldbenotedthatwhenafullstoragefailureoccursitisimpossibletogeneratethePDLconditionasthereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstatewillbeidentifiedbytheESXihostasanAPDcondition.
Althoughthefunctionalityitselfworkedasadvertised,enablingandmanagingitwascumbersomeanderrorprone.Itwasrequiredtosettheoption“disk.terminateVMOnPDLDefault”manually.WithvSphere6.0asimpleoptionintheWebClientisintroducedwhichallowsyoutospecifywhattheresponseshouldbetoaPDLsensecode.
Figure23-EnablingVirtualMachineComponentProtection
Thetwooptionsprovidedare“IssueEvents”and“PoweroffandrestartVMs”.Notethat“PoweroffandrestartVMs”doesexactlythat,yourVMprocessiskilledandtheVMisrestartedonahostwhichstillhasaccesstothestoragedevice.
UntilnowitwasnotpossibleforvSpheretorespondtoanAPDscenario.APDisthesituationwherethestoragedeviceisinaccessiblebutforunknownreasons.Inmostcaseswherethisoccursitistypicallyrelatedtoastoragenetworkproblem.WithvSphere5.1changeswereintroducedtothewayAPDscenarioswerehandledbythehypervisor.ThismechanismisleveragedbyHAtoallowforaresponse.
WhenanAPDoccursatimerstarts.After140secondstheAPDisdeclaredandthedeviceismarkedasAPDtimeout.Whenthe140secondshaspassedHAwillstartcounting.TheHAtimeoutis3minutesbydefaultatshowninFigure24.Whenthe3minuteshaspassed
vSphere6.xHADeepdive
50RestartingVirtualMachines
HAwilltaketheactiondefined.Thereareagaintwooptions“IssueEvents”and“PoweroffandrestartVMs”.
YoucanalsospecifyhowaggressivelyHAneedstotrytorestartVMsthatareimpactedbyanAPD.Notethataggressive/conservativereferstothelikelihoodofHAbeingabletorestartVMs.Whensetto“conservative”HAwillonlyrestarttheVMthatisimpactedbytheAPDifitknowsanotherhostcanrestartit.Inthecaseof“aggressive”HAwilltrytorestarttheVMevenifitdoesn’tknowthestateoftheotherhosts,whichcouldleadtoasituationwhereyourVMisnotrestartedasthereisnohostthathasaccesstothedatastoretheVMislocatedon.
ItisalsogoodtoknowthatiftheAPDisliftedandaccesstothestorageisrestoredduringthetotaloftheapproximate5minutesand20secondsitwouldtakebeforetheVMrestartisinitiated,thatHAwillnotdoanythingunlessyouexplicitlyconfigureitdoso.Thisiswherethe“ResponseforAPDrecoveryafterAPDtimeout”comesintoplay.IfthereisadesiretodosoyoucanrestarttheVMevenwhenthehosthasrecoveredfromtheAPDscenario,duringthe3minute(defaultvalue)graceperiod.
Basicdesignprinciple:Withoutaccesstosharedstorageavirtualmachinebecomesuseless.ItishighlyrecommendedtoconfigureVMCPtoactonaPDLandAPDscenario.Werecommendtosetbothto“poweroffandrestartsVMs”butleavethe“responseforAPDrecoveryafterAPDtimeout”disabledsothatVMsarenotrebootedunnecessarrily.
vSphereHAnuggetsPriortovSphere5.5,HAdidnothingwithVMtoVMAffinityorAntiAffinityrules.Typicallyforpeopleusing“affinity”rulesthiswasnotanissue,butthoseusing“anti-affinity”rulesdidseethisasanissue.Theycreatedtheserulestoensurespecificvirtualmachineswouldneverberunningonthesamehost,butvSphereHAwouldsimplyignoretherulewhenafailurehadoccurredandjustplacetheVMs“randomly”.WithvSphere5.5thishaschanged!vSphereHAisnow“antiaffinity”aware.Inordertoensureanti-affinityrulesarerespectedyoucansetanadvancedsettingorconfigureinthevSphereWebClientasofvSphere6.0.
das.respectVmVmAntiAffinityRules-Values:"false"(default)and"true"
Nownotethatthisalsomeansthatwhenyouconfigureanti-affinityrulesandhavethisadvancedsettingconfiguredto“true”andsomehowtherearen’tsufficienthostsavailabletorespecttheserules…thenruleswillberespectedanditcouldresultinHAnotrestartingaVM.Makesuretounderstandthispotentialimpactwhenconfiguringthissettingandconfiguringtheserules.
vSphere6.xHADeepdive
51RestartingVirtualMachines
WithvSphere6.0supportforrespectingVMtoHostaffinityruleshasbeenincluded.Thisisenabledthroughtheuseofanadvancedsettingcalled“das.respectVmHostSoftAffinityRules”.Whentheadvancedsetting“das.respectVmHostSoftAffinityRules”isconfiguredvSphereHAwilltrytorespecttheruleswhenitcan.IfthereareanyhostsintheclusterwhichbelongtothesameVM-HostgroupthenHAwillrestarttherespectiveVMonthathost.Asthisisa“shouldrule”HAhastheabilitytoignoretherulewhenneeded.IfthereisascenariowherenoneofthehostsintheVM-HostshouldruleisavailableHAwillrestarttheVMonanyotherhostinthecluster.
das.respectVmHostSoftAffinityRules-Values:"false"(default)and"true"
ADDSCREENSHOTHERE!#RestartingVirtualMachines
Inthepreviouschapter,wehavedescribedmostofthelowerlevelfundamentalconceptsofHA.WehaveshownyouthatmultiplemechanismsincreaseresiliencyandreliabilityofHA.ReliabilityofHAinthiscasemostlyreferstorestarting(orresetting)virtualmachines,asthatremainsHA’sprimarytask.
HAwillrespondwhenthestateofahosthaschanged,or,bettersaid,whenthestateofoneormorevirtualmachineshaschanged.TherearemultiplescenariosinwhichHAwillrespondtoavirtualmachinefailure,themostcommonofwhicharelistedbelow:
FailedhostIsolatedhostFailedguestoperatingsystem
Dependingonthetypeoffailure,butalsodependingontheroleofthehost,theprocesswilldifferslightly.Changingtheprocessresultsinslightlydifferentrecoverytimelines.Therearemanydifferentscenariosandthereisnopointincoveringallofthem,sowewilltrytodescribethemostcommonscenarioandincludetimelineswherepossible.
Beforewediveintothedifferentfailurescenarios,wewanttoexplainhowrestartpriorityandretrieswork.
RestartPriorityandOrderHAcantaketheconfiguredpriorityofthevirtualmachineintoaccountwhenrestartingVMs.However,itisgoodtoknowthatAgentVMstakeprecedenceduringtherestartprocedureasthe“regular”virtualmachinesmayrelyonthem.Agoodexampleofanagentvirtualmachineisavirtualstorageappliance.
Prioritizationisdonebyeachhostandnotglobally.Eachhostthathasbeenrequestedtoinitiaterestartattemptswillattempttorestartalltoppriorityvirtualmachinesbeforeattemptingtostartanyothervirtualmachines.Iftherestartofatoppriorityvirtualmachine
vSphere6.xHADeepdive
52RestartingVirtualMachines
fails,itwillberetriedafteradelay.Inthemeantime,however,HAwillcontinuepoweringontheremainingvirtualmachines.Keepinmindthatsomevirtualmachinesmightbedependentontheagentvirtualmachines.Youshoulddocumentwhichvirtualmachinesaredependentonwhichagentvirtualmachinesanddocumenttheprocesstostartuptheseservicesintherightorderinthecasetheautomaticrestartofanagentvirtualmachinefails.
Basicdesignprinciple:Virtualmachinescanbedependentontheavailabilityofagentvirtualmachinesorothervirtualmachines.AlthoughHAwilldoitsbesttoensureallvirtualmachinesarestartedinthecorrectorder,thisisnotguaranteed.Documenttheproperrecoveryprocess.
Besidesagentvirtualmachines,HAalsoprioritizesFTsecondarymachines.Wehavelistedthefullorderinwhichvirtualmachineswillberestartedbelow:
AgentvirtualmachinesFTsecondaryvirtualmachinesVirtualMachinesconfiguredwitharestartpriorityofhighVirtualMachinesconfiguredwithamediumrestartpriorityVirtualMachinesconfiguredwithalowrestartpriority
ItshouldbenotedthatHAwillnotplaceanyvirtualmachinesonahostiftherequirednumberofagentvirtualmachinesarenotrunningonthehostatthetimeplacementisdone.
Nowthatwehavebrieflytouchedonit,wewouldalsoliketoaddress“restartretries”andparallelizationofrestartsasthatmoreorlessdictateshowlongitcouldtakebeforeallvirtualmachinesofafailedorisolatedhostarerestarted.
RestartRetriesThenumberofretriesisconfigurableasofvCenter2.5U4withtheadvancedoption“das.maxvmrestartcount”.Thedefaultvalueis5.Notethattheinitialrestartisincluded.
HAwilltrytostartthevirtualmachineononeofyourhostsintheaffectedcluster;ifthisisunsuccessfulonthathost,therestartcountwillbeincreasedby1.Beforewegointotheexacttimeline,letitbeclearthatT0isthepointatwhichthemasterinitiatesthefirstrestartattempt.Thisbyitselfcouldbe30secondsafterthevirtualmachinehasfailed.Theelapsedtimebetweenthefailureofthevirtualmachineandtherestart,though,willdependonthescenarioofthefailure,whichwewilldiscussinthischapter.
Assaid,thedefaultnumberofrestartsis5.Therearespecifictimesassociatedwitheachoftheseattempts.Thefollowingbulletlistwillclarifythisconcept.The‘m’standsfor“minutes”inthislist.
vSphere6.xHADeepdive
53RestartingVirtualMachines
T0–InitialRestartT2m–Restartretry1T6m–Restartretry2T14m–Restartretry3T30m–Restartretry4
vSphere6.xHADeepdive
54RestartingVirtualMachines
Figure24-HighAvailabilityrestarttimeline
vSphere6.xHADeepdive
55RestartingVirtualMachines
Asclearlydepictedinthediagramabove,asuccessfulpower-onattemptcouldtakeupto~30minutesinthecasewheremultiplepower-onattemptsareunsuccessful.Thisis,however,notexactscience.Forinstance,thereisa2-minutewaitingperiodbetweentheinitialrestartandthefirstrestartretry.HAwillstartthe2-minutewaitassoonasithasdetectedthattheinitialattempthasfailed.So,inreality,T2couldbeT2plus8seconds.Anotherimportantfactthatwewantemphasizeisthatthereisnocoordinationbetweenmasters,andsoifmultipleonesareinvolvedintryingtorestartthevirtualmachine,eachwillretaintheirownsequence.Multiplemasterscouldattempttorestartavirtualmachine.Althoughonlyonewillsucceed,itmightchangesomeofthetimelines.
Let’sgiveanexampletoclarifythescenarioinwhichamasterfailsduringarestartsequence:
Cluster:4Host(esxi01,esxi02,esxi03,esxi04)
Master:esxi01
Thehost“esxi02”isrunningasinglevirtualmachinecalled“vm01”anditfails.Themaster,esxi01,willtrytorestartitbuttheattemptfails.Itwilltryrestarting“vm01”upto5timesbut,
unfortunately,onthe4thtry,themasteralsofails.Anelectionoccursand“esxi03”becomesthenewmaster.Itwillnowinitiatetherestartof“vm01”,andifthatrestartwouldfailitwillretryitupto4timesagainforatotalincludingtheinitialrestartof5.
Beaware,though,thatasuccessfulrestartmightneveroccuriftherestartcountisreachedandallfiverestartattempts(thedefaultvalue)wereunsuccessful.
Whenitcomestorestarts,onethingthatisveryimportanttorealizeisthatHAwillnotissuemorethan32concurrentpower-ontasksonagivenhost.Tomakethatmoreclear,let’susetheexampleofatwohostcluster:ifahostfailswhichcontained33virtualmachinesandall
ofthesehadthesamerestartpriority,32poweronattemptswouldbeinitiated.The33rd
poweronattemptwillonlybeinitiatedwhenoneofthose32attemptshascompletedregardlessofsuccessorfailureofoneofthoseattempts.
Now,herecomesthegotcha.Ifthereare32low-priorityvirtualmachinestobepoweredonandasinglehigh-priorityvirtualmachine,thepoweronattemptforthelow-priorityvirtualmachineswillnotbeissueduntilthepoweronattemptforthehighpriorityvirtualmachinehascompleted.LetitbeabsolutelyclearthatHAdoesnotwaittorestartthelow-priorityvirtualmachinesuntilthehigh-priorityvirtualmachinesarestarted,itwaitsfortheissuedpoweronattempttobereportedas“completed”.Intheory,thismeansthatifthepoweronattemptfails,thelow-priorityvirtualmachinescouldbepoweredonbeforethehighpriorityvirtualmachine.
Therestartpriorityhoweverdoesguaranteethatwhenaplacementisdone,thehigherpriorityvirtualmachinesgetfirstrighttoanyavailableresources.
vSphere6.xHADeepdive
56RestartingVirtualMachines
Basicdesignprinciple:Configuringrestartpriorityofavirtualmachineisnotaguaranteethatvirtualmachineswillactuallyberestartedinthisorder.Ensureproperoperationalproceduresareinplaceforrestartingservicesorvirtualmachinesintheappropriateorderintheeventofafailure.
Nowthatweknowhowvirtualmachinerestartpriorityandrestartretriesarehandled,itistimetolookatthedifferentscenarios.
FailedhostFailureofamasterFailureofaslave
Isolatedhostandresponse
FailedHostWhendiscussingafailedhostscenarioitisneededtomakeadistinctionbetweenthefailureofamasterversusthefailureofaslave.Wewanttoemphasizethisbecausethetimeittakesbeforearestartattemptisinitiateddiffersbetweenthesetwoscenarios.Althoughthemajorityofyouprobablywon’tnoticethetimedifference,itisimportanttocallout.Let’sstartwiththemostcommonfailure,thatofahostfailing,butnotethatfailuresgenerallyoccurinfrequently.Inmostenvironments,hardwarefailuresareveryuncommontobeginwith.Justincaseithappens,itdoesn’thurttounderstandtheprocessanditsassociatedtimelines.
TheFailureofaSlave
Thefailureofaslavehostisisafairlycomplexscenario.Partofthiscomplexitycomesfromtheintroductionofanewheartbeatmechanism.Actually,therearetwodifferentscenarios:onewhereheartbeatdatastoresareconfiguredandonewhereheartbeatdatastoresarenotconfigured.Keepinginmindthatthisisanactualfailureofthehost,thetimelineisasfollows:
T0–Slavefailure.T3s–Masterbeginsmonitoringdatastoreheartbeatsfor15seconds.T10s–Thehostisdeclaredunreachableandthemasterwillpingthemanagementnetworkofthefailedhost.Thisisacontinuouspingfor5seconds.T15s–Ifnoheartbeatdatastoresareconfigured,thehostwillbedeclareddead.T18s–Ifheartbeatdatastoresareconfigured,thehostwillbedeclareddead.
Themastermonitorsthenetworkheartbeatsofaslave.Whentheslavefails,theseheartbeatswillnolongerbereceivedbythemaster.WehavedefinedthisasT0.After3seconds(T3s),themasterwillstartmonitoringfordatastoreheartbeatsanditwilldothisfor
vSphere6.xHADeepdive
57RestartingVirtualMachines
15seconds.Onthe10thsecond(T10s),whennonetworkordatastoreheartbeatshavebeendetected,thehostwillbedeclaredas“unreachable”.Themasterwillalsostartpinging
themanagementnetworkofthefailedhostatthe10thsecondanditwilldosofor5seconds.Ifnoheartbeatdatastoreswereconfigured,thehostwillbedeclared“dead”atthe
15thsecond(T15s)andvirtualmachinerestartswillbeinitiatedbythemaster.Ifheartbeat
datastoreshavebeenconfigured,thehostwillbedeclareddeadatthe18thsecond(T18s)andrestartswillbeinitiated.Werealizethatthiscanbeconfusingandhopethetimelinedepictedinthediagrambelowmakesiteasiertodigest.
vSphere6.xHADeepdive
58RestartingVirtualMachines
Figure25-Restarttimelineslavefailure
Themasterfiltersthevirtualmachinesitthinksfailedbeforeinitiatingrestarts.Themasterusestheprotectedlistforthis,on-diskstatecouldbeobtainedonlybyonemasteratatimesinceitrequiredopeningtheprotectedlistfileinexclusivemode.IfthereisanetworkpartitionmultiplemasterscouldtrytorestartthesamevirtualmachineasvCenterServeralsoprovidedthenecessarydetailsforarestart.Asanexample,itcouldhappenthatamasterhaslockedavirtualmachine’shomedatastoreandhasaccesstotheprotectedlist
vSphere6.xHADeepdive
59RestartingVirtualMachines
whiletheothermasterisincontactwithvCenterServerandassuchisawareofthecurrentdesiredprotectedstate.InthisscenarioitcouldhappenthatthemasterwhichdoesnotownthehomedatastoreofthevirtualmachinewillrestartthevirtualmachinebasedontheinformationprovidedbyvCenterServer.
Thischangeinbehaviorwasintroducedtoavoidthescenariowherearestartofavirtualmachinewouldfailduetoinsufficientresourcesinthepartitionwhichwasresponsibleforthevirtualmachine.Withthischange,thereislesschanceofsuchasituationoccurringasthemasterintheotherpartitionwouldbeusingtheinformationprovidedbyvCenterServertoinitiatetherestart.
Thatleavesuswiththequestionofwhathappensinthecaseofthefailureofamaster.
TheFailureofaMaster
Inthecaseofamasterfailure,theprocessandtheassociatedtimelineareslightlydifferent.Thereasonbeingthatthereneedstobeamasterbeforeanyrestartcanbeinitiated.Thismeansthatanelectionwillneedtotakeplaceamongsttheslaves.Thetimelineisasfollows:
T0–Masterfailure.T10s–Masterelectionprocessinitiated.T25s–Newmasterelectedandreadstheprotectedlist.T35s–Newmasterinitiatesrestartsforallvirtualmachinesontheprotectedlistwhicharenotrunning.
Slavesreceivenetworkheartbeatsfromtheirmaster.Ifthemasterfails,let’sdefinethisasT0(Tzero),theslavesdetectthiswhenthenetworkheartbeatsceasetobereceived.Aseveryclusterneedsamaster,theslaveswillinitiateanelectionatT10s.Theelectionprocesstakes15stocomplete,whichbringsustoT25s.AtT25s,thenewmasterreadstheprotectedlist.Thislistcontainsallthevirtualmachines,whichareprotectedbyHA.AtT35s,themasterinitiatestherestartofallvirtualmachinesthatareprotectedbutnotcurrentlyrunning.Thetimelinedepictedinthediagrambelowhopefullyclarifiestheprocess.
vSphere6.xHADeepdive
60RestartingVirtualMachines
Figure26-Restarttimelinemasterfailure
Besidesthefailureofahost,thereisanotherreasonforrestartingvirtualmachines:anisolationevent.
IsolationResponseandDetectionBeforewewilldiscussthetimelineandtheprocessaroundtherestartofvirtualmachinesafteranisolationevent,wewilldiscussIsolationResponseandIsolationDetection.OneofthefirstdecisionsthatwillneedtobemadewhenconfiguringHAisthe“IsolationResponse”.
IsolationResponse
vSphere6.xHADeepdive
61RestartingVirtualMachines
TheIsolationResponsereferstotheactionthatHAtakesforitsvirtualmachineswhenthehosthaslostitsconnectionwiththenetworkandtheremainingnodesinthecluster.Thisdoesnotnecessarilymeanthatthewholenetworkisdown;itcouldjustbethemanagementnetworkportsofthisspecifichost.Todaytherearethreeisolationresponses:“Poweroff”,“Leavepoweredon”and“Shutdown”.Thisisolationresponseanswersthequestion,“whatshouldahostdowiththevirtualmachinesitmanageswhenitdetectsthatitisisolatedfromthenetwork?”Let’sdiscussthesethreeoptionsmorein-depth:
Poweroff–Whenisolationoccurs,allvirtualmachinesarepoweredoff.Itisahardstop,ortoputitbluntly,the“virtual”powercableofthevirtualmachinewillbepulledout!Shutdown–Whenisolationoccurs,allvirtualmachinesrunningonthehostwillbeshutdownusingaguest-initiatedshutdownthroughVMwareTools.Ifthisisnotsuccessfulwithin5minutes,a“poweroff”willbeexecuted.Thistimeoutvaluecanbeadjustedbysettingtheadvancedoptiondas.isolationShutdownTimeout.IfVMwareToolsisnotinstalled,a“poweroff”willbeinitiatedimmediately.Leavepoweredon–Whenisolationoccursonthehost,thestateofthevirtualmachinesremainsunchanged.
Thissettingcanbechangedontheclustersettingsundervirtualmachineoptions.
Figure27-Clusterdefaultsettings
Thedefaultsettingfortheisolationresponsehaschangedmultipletimesoverthelastcoupleofyearsandthishascausedsomeconfusion.
UptoESXi3.5U2/vCenter2.5U2thedefaultisolationresponsewas“Poweroff”WithESXi3.5U3/vCenter2.5U3thiswaschangedto“Leavepoweredon”WithvSphere4.0itwaschangedto“Shutdown”.WithvSphere5.0ithasbeenchangedto“Leavepoweredon”.
Keepinmindthatthesechangesareonlyapplicabletonewlycreatedclusters.Whencreatinganewcluster,itmayberequiredtochangethedefaultisolationresponsebasedontheconfigurationofexistingclustersand/oryourcustomer’srequirements,constraintsandexpectations.Whenupgradinganexistingcluster,itmightbewisetoapplythelatestdefaultvalues.Youmightwonderwhythedefaulthaschangedonceagain.Therewasalotoffeedbackfromcustomersthat“Leavepoweredon”wasthedesireddefaultvalue.
vSphere6.xHADeepdive
62RestartingVirtualMachines
Basicdesignprinciple:Beforeupgradinganenvironmenttolaterversions,ensureyouvalidatethebestpracticesanddefaultsettings.Documentthem,includingjustification,toensureallpeopleinvolvedunderstandyourreasons.
Thequestionremains,whichsettingshouldbeused?Theobviousanswerapplieshere;itdepends.Weprefer“Leavepoweredon”becauseiteliminatesthechancesofhavingafalsepositiveanditsassociateddowntime.OneoftheproblemsthatpeoplehaveexperiencedinthepastisthatHAtriggereditsisolationresponsewhenthefullmanagementnetworkwentdown.Basicallyresultinginthepoweroff(orshutdown)ofeverysinglevirtualmachineandnonebeingrestarted.Thisproblemhasbeenmitigated.HAwillvalidateifvirtualmachinesrestartscanbeattempted–thereisnoreasontoincuranydowntimeunlessabsolutelynecessary.Itdoesthisbyvalidatingthatamasterownsthedatastorethevirtualmachineisstoredon.Ofcourse,theisolatedhostcanonlyvalidatethisifithasaccesstothedatastores.InaconvergednetworkenvironmentwithiSCSIstorage,forinstance,itwouldbeimpossibletovalidatethisduringafullisolationasthevalidationwouldfailduetotheinaccessibledatastorefromtheperspectiveoftheisolatedhost.
Wefeelthatchangingtheisolationresponseismostusefulinenvironmentswhereafailureofthemanagementnetworkislikelycorrelatedwithafailureofthevirtualmachinenetwork(s).Ifthefailureofthemanagementnetworkwon’tlikelycorrespondwiththefailureofthevirtualmachinenetworks,isolationresponsewouldcauseunnecessarydowntimeasthevirtualmachinescancontinuetorunwithoutmanagementnetworkconnectivitytothehost.
Aseconduseforpoweroff/shutdownisinscenarioswherethevirtualmachineretainsaccesstothevirtualmachinenetworkbutlosesaccesstoitsstorage,leavingthevirtualmachinepowered-oncouldresultintwovirtualmachinesonthenetworkwiththesameIPaddress.
Itisstilldifficulttodecidewhichisolationresponseshouldbeused.Thefollowingtablewascreatedtoprovidesomemoreguidelines.
vSphere6.xHADeepdive
63RestartingVirtualMachines
Likelihoodthathostwillretainaccessto
VMdatastore
LikelihoodVMswillretain
accesstoVM
network
RecommendedIsolationPolicy
Rationale
Likely Likely LeavePoweredOn
Virtualmachineisrunningfine,noreasontopoweritoff
Likely UnlikelyEitherLeavePoweredOnorShutdown.
ChooseshutdowntoallowHAtorestartvirtualmachinesonhoststhatarenotisolatedandhencearelikelytohaveaccesstostorage
Unlikely Likely PowerOff
UsePowerOfftoavoidhavingtwoinstancesofthesamevirtualmachineonthevirtualmachinenetwork
Unlikely UnlikelyLeavePoweredOnorPowerOff
LeavePoweredonifthevirtualmachinecanrecoverfromthenetwork/datastoreoutageifitisnotrestartedbecauseoftheisolation,andPowerOffifitlikelycan’t.
Thequestionthatwehaven’tansweredyetishowHAknowswhichvirtualmachineshavebeenpowered-offduetothetriggeredisolationresponseandwhytheisolationresponseismorereliablethanwithpreviousversionsofHA.Previously,HAdidnotcareandwouldalwaystrytorestartthevirtualmachinesaccordingtothelastknownstateofthehost.Thatisnolongerthecase.Beforetheisolationresponseistriggered,theisolatedhostwillverifywhetheramasterisresponsibleforthevirtualmachine.
Asmentionedearlier,itdoesthisbyvalidatingifamasterownsthehomedatastoreofthevirtualmachine.Whenisolationresponseistriggered,theisolatedhostremovesthevirtualmachineswhicharepoweredofforshutdownfromthe“poweron”file.Themasterwillrecognizethatthevirtualmachineshavedisappearedandinitiatearestart.Ontopofthat,whentheisolationresponseistriggered,itwillcreateaper-virtualmachinefileundera“poweredoff”directorywhichindicatesforthemasterthatthisvirtualmachinewaspowereddownasaresultofatriggeredisolationresponse.Thisinformationwillbereadbythemasternodewhenitinitiatestherestartattemptinordertoguaranteethatonlyvirtualmachinesthatwerepoweredoff/shutdownbyHAwillberestartedbyHA.
Thisis,however,onlyonepartoftheincreasedreliabilityofHA.Reliabilityhasalsobeenimprovedwithrespectto“isolationdetection,”whichwillbedescribedinthefollowingsection.
IsolationDetection
vSphere6.xHADeepdive
64RestartingVirtualMachines
Wehaveexplainedwhattheoptionsaretorespondtoanisolationeventandwhathappenswhentheselectedresponseistriggered.However,wehavenotextensivelydiscussedhowisolationisdetected.Themechanismisfairlystraightforwardandworkswithheartbeats,asearlierexplained.Thereare,however,twoscenariosagain,andtheprocessandassociatedtimelinesdifferforeachofthem:
IsolationofaslaveIsolationofamaster
Beforeweexplainthedifferencesinprocessbetweenbothscenarios,wewanttomakesureitisclearthatachangeinstatewillresultintheisolationresponsenotbeingtriggeredineitherscenario.Meaningthatifasinglepingissuccessfulorthehostobserveselectiontrafficandiselectedamasterorslave,theisolationresponsewillnotbetriggered,whichisexactlywhatyouwantasavoidingdowntimeisatleastasimportantasrecoveringfromdowntime.Whenahosthasdeclareditselfisolatedandobserveselectiontrafficitwilldeclareitselfnolongerisolated.
IsolationofaSlave
HAtriggersamasterelectionprocessbeforeitwilldeclareahostisisolated.Inthebelowtimeline,“s”referstoseconds.
T0–Isolationofthehost(slave)T10s–Slaveenters“electionstate”T25s–SlaveelectsitselfasmasterT25s–Slavepings“isolationaddresses”T30s–SlavedeclaresitselfisolatedT60s–Slave“triggers”isolationresponse
WhentheisolationresponseistriggeredHAcreatesa“power-off”fileforanyvirtualmachineHApowersoffwhosehomedatastoreisaccessible.Nextitpowersoffthevirtualmachine(orshutsdown)andupdatesthehost’spoweronfile.Thepower-offfileisusedtorecordthatHApoweredoffthevirtualmachineandsoHAshouldrestartit.Thesepower-offfilesaredeletedwhenavirtualmachineispoweredbackonorHAisdisabled.
Afterthecompletionofthissequence,themasterwilllearntheslavewasisolatedthroughthe“poweron”fileasmentionedearlier,andwillrestartvirtualmachinesbasedontheinformationprovidedbytheslave.
vSphere6.xHADeepdive
65RestartingVirtualMachines
Figure28-Isolationofaslavetimeline
IsolationofaMaster
Inthecaseoftheisolationofamaster,thistimelineisabitlesscomplicatedbecausethereisnoneedtogothroughanelectionprocess.Inthistimeline,“s”referstoseconds.
T0–Isolationofthehost(master)T0–Masterpings“isolationaddresses”T5s–MasterdeclaresitselfisolatedT35s–Master“triggers”isolationresponse
AdditionalChecks
Beforeahostdeclaresitselfisolated,itwillpingthedefaultisolationaddresswhichisthegatewayspecifiedforthemanagementnetwork,andwillcontinuetopingtheaddressuntilitbecomesunisolated.HAgivesyoutheoptiontodefineoneormultipleadditionalisolationaddressesusinganadvancedsetting.Thisadvancedsettingiscalleddas.isolationaddressandcouldbeusedtoreducethechancesofhavingafalsepositive.Werecommendsettinganadditionalisolationaddress.Ifasecondarymanagementnetworkisconfigured,thisadditionaladdressshouldbepartofthesamenetworkasthesecondarymanagementnetwork.Ifrequired,youcanconfigureupto10additionalisolationaddresses.Asecondarymanagementnetworkwillmorethanlikelybeonadifferentsubnetanditisrecommendedtospecifyanadditionalisolationaddresswhichispartofthesubnet.
vSphere6.xHADeepdive
66RestartingVirtualMachines
Figure29-IsolationAddress
SelectinganAdditionalIsolationAddressAquestionaskedbymanypeopleiswhichaddressshouldbespecifiedforthisadditionalisolationverification.Wegenerallyrecommendanisolationaddressclosetothehoststoavoidtoomanynetworkhopsandanaddressthatwouldcorrelatewiththelivenessofthevirtualmachinenetwork.Inmanycases,themostlogicalchoiceisthephysicalswitchtowhichthehostisdirectlyconnected.Basically,usethegatewayforwhateversubnetyourmanagementnetworkison.Anotherusualsuspectwouldbearouteroranyotherreliableandpingabledeviceonthesamesubnet.However,whenyouareusingIP-basedsharedstoragelikeNFSoriSCSI,theIP-addressofthestoragedevicecanalsobeagoodchoice.
Basicdesignprinciple:Selectareliablesecondaryisolationaddress.Trytominimizethenumberof“hops”betweenthehostandthisaddress.
IsolationPolicyDelayForthosewhowanttoincreasethetimeittakesbeforeHAexecutestheisolationresponseanadvancedsettingisavailable.Thussettingiscalled“das.config.fdm.isolationPolicyDelaySec”andallowschangingthenumberofsecondstowaitbeforetheisolationpolicyisexecutedis.Theminimumvalueis30.Ifsettoavaluelessthan30,thedelaywillbe30seconds.Wedonotrecommendchangingthisadvancedsettingunlessthereisaspecificrequirementtodoso.Inalmostallscenarios30secondsshouldsuffice.
RestartingVirtualMachinesThemostimportantprocedurehasnotyetbeenexplained:restartingvirtualmachines.Wehavededicatedafullsectiontothisconcept.
vSphere6.xHADeepdive
67RestartingVirtualMachines
Wehaveexplainedthedifferenceinbehaviorfromatimingperspectiveforrestartingvirtualmachinesinthecaseofabothmasternodeandslavenodefailures.Fornow,let’sassumethataslavenodehasfailed.WhenthemasternodedeclarestheslavenodeasPartitionedorIsolated,itdetermineswhichvirtualmachineswererunningonusingtheinformationitpreviouslyreadfromthehost’s“poweron”file.Thesefilesareasynchronouslyreadapproximatelyevery30s.IfthehostwasnotPartitionedorIsolatedbeforethefailure,themasterusescacheddatatodeterminethevirtualmachinesthatwerelastrunningonthehostbeforethefailureoccurred.
Beforeitwillinitiatetherestartattempts,though,themasterwillfirstvalidatethatthevirtualmachineshouldberestarted.ThisvalidationusestheprotectioninformationvCenterServerprovidestoeachmaster,orifthemasterisnotincontactwithvCenterServer,theinformationsavedintheprotectedlistfiles.IfthemasterisnotincontactwithvCenterServerorhasnotlockedthefile,thevirtualmachineisfilteredout.Atthispoint,allvirtualmachineshavingarestartpriorityof“disabled”arealsofilteredout.
NowthatHAknowswhichvirtualmachinesitshouldrestart,itistimetodecidewherethevirtualmachinesareplaced.HAwilltakemultiplethingsintoaccount:
CPUandmemoryreservation,includingthememoryoverheadofthevirtualmachineUnreservedcapacityofthehostsintheclusterRestartpriorityofthevirtualmachinerelativetotheothervirtualmachinesthatneedtoberestartedVirtual-machine-to-hostcompatibilitysetThenumberofdvPortsrequiredbyavirtualmachineandthenumberavailableonthecandidatehostsThemaximumnumberofvCPUsandvirtualmachinesthatcanberunonagivenhostRestartlatencyWhethertheactivehostsarerunningtherequirednumberofagentvirtualmachines.
Restartlatencyreferstotheamountoftimeittakestoinitiatevirtualmachinerestarts.Thismeansthatvirtualmachinerestartswillbedistributedbythemasteracrossmultiplehoststoavoidabootstorm,andthusadelay,onasinglehost.
Ifaplacementisfound,themasterwillsendeachtargethostthesetofvirtualmachinesitneedstorestart.Ifthislistexceeds32virtualmachines,HAwilllimitthenumberofconcurrentpoweronattemptsto32.Ifavirtualmachinesuccessfullypowerson,thenodeonwhichthevirtualmachinewaspoweredonwillinformthemasterofthechangeinpowerstate.Themasterwillthenremovethevirtualmachinefromtherestartlist.
Ifaplacementcannotbefound,themasterwillplacethevirtualmachineona“pendingplacementlist”andwillretryplacementofthevirtualmachinewhenoneofthefollowingconditionschanges:
vSphere6.xHADeepdive
68RestartingVirtualMachines
Anewvirtual-machine-to-hostcompatibilitylistisprovidedbyvCenter.Ahostreportsthatitsunreservedcapacityhasincreased.Ahost(re)joinsthecluster(Forinstance,whenahostistakenoutofmaintenancemode,ahostisaddedtoacluster,etc.)Anewfailureisdetectedandvirtualmachineshavetobefailedover.Afailureoccurredwhenfailingoveravirtualmachine.
ButwhataboutDRS?Wouldn’tDRSbeabletohelpduringtheplacementofvirtualmachineswhenallelsefails?Itdoes.ThemasternodewillreporttovCenterthesetofvirtualmachinesthatwerenotplacedduetoinsufficientresources,asisthecasetoday.IfDRSisenabled,thisinformationwillbeusedinanattempttohaveDRSmakecapacityavailable.
ComponentProtectionInvSphere6.0anewfeatureaspartofvSphereHAisintroducedcalledVMComponentProtection.VMComponentProtection(VMCP)invSphere6.0allowsyoutoprotectvirtualmachinesagainstthefailureofyourstoragesystem.TherearetwotypesoffailuresVMCPwillrespondtoandthosearePermanentDeviceLoss(PDL)andAllPathsDown(APD).Beforewelookatsomeofthedetails,wewanttopointoutthatenablingVMCPisextremelyeasy.Itcanbeenabledbyasingletickboxasshowninthescreenshotbelow.
Figure30-VirtualMachineComponentProtection
vSphere6.xHADeepdive
69RestartingVirtualMachines
AsstatedtherearetwoscenariosHAcanrespondto,PDLandAPD.Letslookatthosetwoscenariosabitcloser.WithvSphere5.0afeaturewasintroducedasanadvancedoptionthatwouldallowvSphereHAtorestartVMsimpactedbyaPDLcondition.
APDLcondition,isaconditionthatiscommunicatedbythearraycontrollertoESXiviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditionwouldbecommunicatedbythearraywouldbewhenaLUNissetoffline.ThisconditionisusedduringafailurescenariotoensureESXitakesappropriateactionwhenaccesstoaLUNisrevoked.ItshouldbenotedthatwhenafullstoragefailureoccursitisimpossibletogeneratethePDLconditionasthereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstatewillbeidentifiedbytheESXihostasanAPDcondition.
Althoughthefunctionalityitselfworkedasadvertised,enablingandmanagingitwascumbersomeanderrorprone.Itwasrequiredtosettheoption“disk.terminateVMOnPDLDefault”manually.WithvSphere6.0asimpleoptionintheWebClientisintroducedwhichallowsyoutospecifywhattheresponseshouldbetoaPDLsensecode.
Figure31-EnablingVirtualMachineComponentProtection
Thetwooptionsprovidedare“IssueEvents”and“PoweroffandrestartVMs”.Notethat“PoweroffandrestartVMs”doesexactlythat,yourVMprocessiskilledandtheVMisrestartedonahostwhichstillhasaccesstothestoragedevice.
UntilnowitwasnotpossibleforvSpheretorespondtoanAPDscenario.APDisthesituationwherethestoragedeviceisinaccessiblebutforunknownreasons.Inmostcaseswherethisoccursitistypicallyrelatedtoastoragenetworkproblem.WithvSphere5.1changeswereintroducedtothewayAPDscenarioswerehandledbythehypervisor.ThismechanismisleveragedbyHAtoallowforaresponse.
WhenanAPDoccursatimerstarts.After140secondstheAPDisdeclaredandthedeviceismarkedasAPDtimeout.Whenthe140secondshaspassedHAwillstartcounting.TheHAtimeoutis3minutesbydefaultatshowninFigure24.Whenthe3minuteshaspassed
vSphere6.xHADeepdive
70RestartingVirtualMachines
HAwilltaketheactiondefined.Thereareagaintwooptions“IssueEvents”and“PoweroffandrestartVMs”.
YoucanalsospecifyhowaggressivelyHAneedstotrytorestartVMsthatareimpactedbyanAPD.Notethataggressive/conservativereferstothelikelihoodofHAbeingabletorestartVMs.Whensetto“conservative”HAwillonlyrestarttheVMthatisimpactedbytheAPDifitknowsanotherhostcanrestartit.Inthecaseof“aggressive”HAwilltrytorestarttheVMevenifitdoesn’tknowthestateoftheotherhosts,whichcouldleadtoasituationwhereyourVMisnotrestartedasthereisnohostthathasaccesstothedatastoretheVMislocatedon.
ItisalsogoodtoknowthatiftheAPDisliftedandaccesstothestorageisrestoredduringthetotaloftheapproximate5minutesand20secondsitwouldtakebeforetheVMrestartisinitiated,thatHAwillnotdoanythingunlessyouexplicitlyconfigureitdoso.Thisiswherethe“ResponseforAPDrecoveryafterAPDtimeout”comesintoplay.IfthereisadesiretodosoyoucanrestarttheVMevenwhenthehosthasrecoveredfromtheAPDscenario,duringthe3minute(defaultvalue)graceperiod.
Basicdesignprinciple:Withoutaccesstosharedstorageavirtualmachinebecomesuseless.ItishighlyrecommendedtoconfigureVMCPtoactonaPDLandAPDscenario.Werecommendtosetbothto“poweroffandrestartsVMs”butleavethe“responseforAPDrecoveryafterAPDtimeout”disabledsothatVMsarenotrebootedunnecessarrily.
vSphereHAnuggetsPriortovSphere5.5,HAdidnothingwithVMtoVMAffinityorAntiAffinityrules.Typicallyforpeopleusing“affinity”rulesthiswasnotanissue,butthoseusing“anti-affinity”rulesdidseethisasanissue.Theycreatedtheserulestoensurespecificvirtualmachineswouldneverberunningonthesamehost,butvSphereHAwouldsimplyignoretherulewhenafailurehadoccurredandjustplacetheVMs“randomly”.WithvSphere5.5thishaschanged!vSphereHAisnow“antiaffinity”aware.Inordertoensureanti-affinityrulesarerespectedyoucansetanadvancedsettingorconfigureinthevSphereWebClientasofvSphere6.0.
das.respectVmVmAntiAffinityRules-Values:"false"(default)and"true"
Nownotethatthisalsomeansthatwhenyouconfigureanti-affinityrulesandhavethisadvancedsettingconfiguredto“true”andsomehowtherearen’tsufficienthostsavailabletorespecttheserules…thenruleswillberespectedanditcouldresultinHAnotrestartingaVM.Makesuretounderstandthispotentialimpactwhenconfiguringthissettingandconfiguringtheserules.
vSphere6.xHADeepdive
71RestartingVirtualMachines
WithvSphere6.0supportforrespectingVMtoHostaffinityruleshasbeenincluded.Thisisenabledthroughtheuseofanadvancedsettingcalled“das.respectVmHostSoftAffinityRules”.Whentheadvancedsetting“das.respectVmHostSoftAffinityRules”isconfiguredvSphereHAwilltrytorespecttheruleswhenitcan.IfthereareanyhostsintheclusterwhichbelongtothesameVM-HostgroupthenHAwillrestarttherespectiveVMonthathost.Asthisisa“shouldrule”HAhastheabilitytoignoretherulewhenneeded.IfthereisascenariowherenoneofthehostsintheVM-HostshouldruleisavailableHAwillrestarttheVMonanyotherhostinthecluster.
das.respectVmHostSoftAffinityRules-Values:"false"(default)and"true"
ADDSCREENSHOTHERE!
vSphere6.xHADeepdive
72RestartingVirtualMachines
VirtualSANandVirtualVolumesspecificsInthelastcoupleofsectionswehavediscussedtheinsandoutofHA.AllofitbasedonVMFSbasedorNFSbasedstorage.WiththeintroductionofVirtualSANandVirtualVolumesalsocomeschangestosomeofthediscussedconcepts.
HAandVirtualSANVirtualSANisVMware’sapproachtoSoftwareDefinedStorage.WearenotgoingtoexplaintheinsandoutsofVirtualSAN,butdowanttoprovideabasicunderstandingforthosewhohaveneverdoneanythingwithit.VirtualSANleverageshostlocalstorageandcreatesashareddatastoreoutofit.
Figure32-VirtualSANCluster
vSphere6.xHADeepdive
73VirtualSANandVirtualVolumesspecifics
VirtualSANrequiresaminimumof3hostsandeachofthose3hostswillneedtohave1SSDforcachingand1capacitydevice(canbeSSDorHDD).Onlythecapacitydeviceswillcontributetotheavailablecapacityofthedatastore.Ifyouhave1TBworthofcapacitydevicesperhostthenwiththreehoststhetotalsizeofyourdatastorewillbe3TB.
Havingthatsaid,withVirtualSAN6.1VMwareintroduceda"2-node"option.This2-nodeoptionisactually2regularVSANnodeswithathird"witness"node.
ThebigdifferentiatorbetweenmoststoragesystemsandVirtualSANisthatavailabilityofthevirtualmachine’sisdefinedonapervirtualdiskorpervirtualmachinebasis.Thisiscalled“FailuresToTolerate”andcanbeconfiguredtoanyvaluebetween0(zero)and3.Whenconfiguredto0thenthevirtualmachinewillhaveonly1copyofitsvirtualdiskswhichmeansthatifahostfailswherethevirtualdisksarestoredthevirtualmachineislost.AssuchallvirtualmachinesaredeployedbydefaultwithFailuresToTolerate(FTT)setto1.AvirtualdiskiswhatVSANreferstoasanobject.Anobject,whenFTTisconfiguredas1orhigher,hasmultiplecomponents.InthediagrambelowwedemonstratetheFTT=1scenario,andthevirtualdiskinthiscasehas2"datacomponents"anda"witnesscomponents".Thewitnessisusedasa"quorom"mechnanism.
Figure33-VirtualSANObjectmodel
vSphere6.xHADeepdive
74VirtualSANandVirtualVolumesspecifics
Asthediagramabovedepicts,avirtualmachinecanberunningonthefirsthostwhileitsstoragecomponentsareontheremaininghostsinthecluster.AsyoucanimaginefromanHApointofviewthischangesthingsasaccesstothenetworkisnotonlycriticalforHAtofunctioncorrectlybutalsoforVirtualSAN.WhenitcomestonetworkingnotethatwhenVirtualSANisconfiguredinaclusterHAwillusethesamenetworkforitscommunications(heartbeatingetc).Ontopofthat,itisgoodtoknowthatVMwarehighlyrecommends10GbEtobeusedforVirtualSAN.
Basicdesignprinciple:10GbEishighlyrecommendforVirtualSAN,asvSphereHAalsoleveragestheVirtualSANnetworkandavailabilityofVMsisdependentonnetworkconnectivityensurethatataminimumtwo10GbEportsareusedandtwophysicalswitchesforresiliency.
ThereasonthatHAusesthesamenetworkasVirtualSANissimple,itistooavoidnetworkpartitionscenarioswhereHAcommunicationsisseparatedfromVirtualSANandthestateoftheclusterisunclear.NotethatyouwillneedtoensurethatthereisapingableisolationaddressontheVirtualSANnetworkandthisisolationaddresswillneedtobeconfiguredassuchthroughtheuseoftheadvancedsetting“das.isolationAddress0”.Wealsorecommendtodisabletheuseofthedefaultisolationaddressthroughtheadvancedsetting“das.useDefaultIsolationAddress”(settofalse).
Whenanisolationdoesoccurtheisolationresponseistriggeredasexplainedinearlierchapters.ForVirtualSANtherecommendationissimple,configuretheisolationresponseto“PowerOff,thenfailover”.Thisisthesafestoption.VirtualSANcanbecomparedtothe“convergednetworkwithIPbasedstorage”exampleweprovided.ItisveryeasytoreachasituationwhereahostisisolatedallvirtualmachinesremainrunningbutarerestartedonanotherhostbecausetheconnectiontotheVirtualSANdatastoreislost.
Basicdesignprinciple:ConfigureyourIsolationAddressandyourIsolationPolicyaccordingly.Werecommendselecting“poweroff”astheIsolationPolicyandareliablepingabledeviceastheisolationaddress.ItisrecommendedtoconfiguretheIsolationPolicyto“poweroff”.
WhataboutthingslikeheartbeatdatastoresandthefolderstructurethatexistsonaVMFSdatastore,hasanyofthatchangedwithVirtualSAN.Yesithas.Firstofall,ina“VirtualSAN”onlyenvironmenttheconceptofHearbeatDatastoresisnotusedatall.Thereasonforthisisstraightforward,asHAandVirtualSANsharethesamenetworkitissafetoassumethatwhentheHAheartbeatislostbecauseofanetworkfailuresoisaccesstotheVirtualSANdatastore.Onlyinanenvironmentwherethereisalsotraditionalstoragetheheartbeatdatastoreswillbeconfigured,leveragingthosetraditionaldatastoresasaheartbeatdatastore.NotethatwedonotfeelthereisareasontointroducetraditionalstoragejusttoprovideHAthisfunctionality,HAandVirtualSANworkperfectlyfinewithoutheartbeatdatastores.
vSphere6.xHADeepdive
75VirtualSANandVirtualVolumesspecifics
NormallyHAmetadataisstoredintherootofthedatastore,forVirtualSANthisisdifferentasthemetadataisstoredintheVMsnamespaceobject.TheprotectedlistisheldinmemoryandupdatedautomaticallywhenVMsarepoweredonoroff.
Nowyoumaywonder,whathappenswhenthereisanisolation?HowdoesHAknowwheretostarttheVMthatisimpacted?Letstakealookatapartitionscenario.
Figure34-VSANPartitionscenario
Inthisscenariothereanetworkproblemhascausedaclusterpartition.WhereaVMisrestartedisdeterminedbywhichpartitionownsthevirtualmachinefiles.WithinaVSANclusterthisisfairlystraightforward.Therearetwopartitions,oneofwhichisrunningtheVMwithitsVMDKandtheotherpartitionhasaVMDKreplicaandawitness.Guesswhathappens?Right,VSANusesthewitnesstoseewhichpartitionhasquorumandbasedonthatresult,oneofthetwopartitionswillwin.Inthiscase,Partition2hasmorethan50%ofthecomponentsofthisobjectandassuchisthewinner.ThismeansthattheVMwillbe
vSphere6.xHADeepdive
76VirtualSANandVirtualVolumesspecifics
restartedoneither“esxi-03″or“esxi-04″byvSphereHA.NotethattheVMinPartition1willbepoweredoffonlyifyouhaveconfiguredtheisolationresponsetodoso.Wewouldliketostressthatthisishighlyrecommended!(Isolationresponse–>poweroff)
HAandVirtualVolumesLetusstartwithfirstdescribingwhatVirtualVolumesisandwhatvalueitbringsforanadministrator.VirtualVolumeswasdevelopedtomakeyourlife(vSphereadmin)andthatofthestorageadministratoreasier.ThisisdonebyprovidingaframeworkthatenablesthevSphereadministratortoassignpoliciestovirtualmachinesorvirtualdisks.Inthesepoliciescapabilitiesofthestoragearraycanbedefined.Thesecapabilitiescanbethingslikesnapshotting,deduplication,raid-level,thin/thickprovisioningetc.WhatisofferedtothevSphereadministratorisuptotheStorageadministrator,andofcourseuptowhatthestoragesystemcanoffertobeginwith.Whenavirtualmachineisdeployedandapolicyisassignedthenthestoragesystemwillenablecertainfunctionalityofthearraybasedonwhatwasspecifiedinthepolicy.SonolongeraneedtoassigncapabilitiestoaLUNwhichholdsmanyVMs,butratheraperVMorevenperVMDKlevelcontrol.Sohowdoesthiswork?Wellletstakealookatanarchitecturaldiagramfirst.
vSphere6.xHADeepdive
77VirtualSANandVirtualVolumesspecifics
Figure35-VirtualVolumesArchitecture
ThediagramshowsacoupleofcomponentswhichareimportantintheVVolarchitecture.Letslistthemout:
ProtocolEndpointsakaPEVirtualDatastoreandaStorageContainerVendorProvider/VASAPoliciesVirtualVolumes
Letstakealookatallofthesethreeintheaboveorder.ProtocolEndpoints,whatarethey?
ProtocolEndpointsareliterallytheaccesspointtoyourstoragesystem.AllIOtovirtualvolumesisproxiedthroughaProtocolEndpointandyoucanhave1ormoreoftheseperstoragesystem,ifyourstoragesystemsupportshavingmultipleofcourse.(Implementationsofdifferentvendorswillvary.)PEsarecompatiblewithdifferentprotocols(FC,FCoE,iSCSI,NFS)andifyouaskmethatwholediscussionwithVirtualVolumeswillcometoanend.You
vSphere6.xHADeepdive
78VirtualSANandVirtualVolumesspecifics
couldseeaProtocolEndpointasa“mountpoint”oradevice,andyestheywillcounttowardsyourmaximumnumberofdevicesperhost(256).(VirtualVolumesitselfwon’tcounttowardsthat!)
NextupistheStorageContainer.Thisistheplacewhereyoustoreyourvirtualmachines,orbettersaidwhereyourvirtualvolumesendup.TheStorageContainerisastoragesystemlogicalconstructandisrepresentedwithinvSphereasa“virtualdatastore”.Youneed1perstoragesystem,butyoucanhavemanywhendesired.TothisStorageContaineryoucanapplycapabilities.Soifyoulikeyourvirtualvolumestobeabletousearraybasedsnapshotsthenthestorageadministratorwillneedtoassignthatcapabilitytothestoragecontainer.Notethatastorageadministratorcangrowastoragecontainerwithouteveninformingyou.Astoragecontainerisn’tformattedwithVMFSoranythinglikethat,soyoudon’tneedtoincreasethevolumeinordertousethespace.
ButhowdoesvSphereknowwhichcontaineriscapableofdoingwhat?Inordertodiscoverastoragecontaineranditscapabilitiesweneedtobeabletotalktothestoragesystemfirst.ThisisdonethroughthevSphereAPIsforStorageAwareness.YousimplypointvSpheretotheVendorProviderandthevendorproviderwillreporttovSpherewhat’savailable,thisincludesboththestoragecontainersaswellasthecapabilitiestheypossess.NotethatasingleVendorProvidercanbemanagingmultiplestoragesystemswhichinitsturncanhavemultiplestoragecontainerswithmanycapabilities.Thesevendorproviderscanalsocomeindifferentflavours,forsomestoragesystemsitispartoftheirsoftwarebutforothersitwillcomeasavirtualappliancethatsitsontopofvSphere.
NowthatvSphereknowswhichsystemsthereare,whatcontainersareavailablewithwhichcapabilitiesyoucanstartcreatingpolicies.Thesepoliciescanbeacombinationofcapabilitiesandwillultimatelybeassignedtovirtualmachinesorvirtualdiskseven.YoucanimaginethatinsomecasesyouwouldlikeQualityofServiceenabledtoensureperformanceforaVMwhileinothercasesitisn’tasrelevantbutyouneedtohaveasnapshoteveryhour.Allofthisisenabledthroughthesepolicies.NolongerwillyoubemaintainingthatspreadsheetwithallyourLUNsandwhichdataservicewereenabledandwhatnot,noyousimplyassignapolicy.(Yes,apropernamingschemewillbehelpfulwhendefiningpolicies.)WhenrequirementschangeforaVMyoudon’tmovetheVMaround,noyouchangethepolicyandthestoragesystemwilldowhatisrequiredinordertomaketheVM(anditsdisks)compliantagainwiththepolicy.NottheVMreally,buttheVirtualVolumes.
Okay,thosearethebasics,nowwhataboutVirtualVolumesandvSphereHA.WhatchangeswhenyouarerunningVirtualVolumes,whatdoyouneedtokeepinmindwhenrunningVirtualVolumeswhenitcomestoHA?
Firstofall,letmementionthis,insomecasesstoragevendorshavedesignedasolutionwherethe"vendorprovider"isn'tdesignedinanHAfashion(VMwareallowsforActive/Active,Active/Standbyorjust"Active"asinasingleinstance).Makesuretovalidate
vSphere6.xHADeepdive
79VirtualSANandVirtualVolumesspecifics
whatkindofimplementationyourstoragevendorhas,astheVendorProviderneedstobeavailablewhenpoweringonVMs.Thefollowingquoteexplainswhy:
WhenaVirtualVolumeiscreated,itisnotimmediatelyaccessibleforIO.ToAccessVirtualVolumes,vSphereneedstoissuea“Bind”operationtoaVASAProvider(VP),whichcreatesIOaccesspointforaVirtualVolumeonaProtocolEndpoint(PE)chosenbyaVP.AsinglePEcanbetheIOaccesspointformultipleVirtualVolumes.“Unbind”OperationwillremovethisIOaccesspointforagivenVirtualVolume.
Thatisthe"VirtualVolumes"implementationaspect,butofcoursethingshavealsochangedfromavSphereHApointofview.NolongerdowehaveVMFSorNFSdatastorestostorefilesonoruseforheartbeating.Whatchangesfromthatperspective.FirstofallaVMiscarvedupindifferentVirtualVolumes:
VMConfigurationVirtualMachineDisk'sSwapFileSnapshot(ifthereareany)
Besidesthesedifferenttypesofobjects,whenvSphereHAisenabledtherealsoisavolumeusedbyvSphereHAandthisvolumewillcontainallthemetadatawhichisnormallystoredunder"/<rootofdatastore>/.vSphere-HA/<cluster-specific-directory>/"onregularVMFS.ForeachFaultDomainaseperatefolderwillbecreatedinthisVVol.
AllVMrelatedHAfileswhichnormallywouldbeundertheVMfolder,likeforinstancethepower-offfile,arenowstoredintheVMConfigurationVVolobject.ConceptuallyspeakingsimilartoregularVMFS,implementationwisehowevercompletelydifferent.
AnotherthingthatchangeswithVVolsisHeartbeatDatastores.
BEINGWORKEDON-EARLYDRAFT
vSphere6.xHADeepdive
80VirtualSANandVirtualVolumesspecifics
AddingResiliencytoHA(NetworkRedundancy)InthepreviouschapterweextensivelycoveredbothIsolationDetection,whichtriggerstheselectedIsolationResponseandtheimpactofafalsepositive.TheIsolationResponseenablesHAtorestartvirtualmachineswhen“Poweroff”or“Shutdown”hasbeenselectedandthehostbecomesisolatedfromthenetwork.However,thisalsomeansthatitispossiblethat,withoutproperredundancy,theIsolationResponsemaybeunnecessarilytriggered.Thisleadstodowntimeandshouldbeprevented.
Toincreaseresiliencyfornetworking,VMwareimplementedtheconceptofNICteaminginthehypervisorforbothVMkernelandvirtualmachinenetworking.WhendiscussingHA,thisisespeciallyimportantfortheManagementNetwork.
NICteamingistheprocessofgroupingtogetherseveralphysicalNICsintoonesinglelogicalNIC,whichcanbeusedfornetworkfaulttoleranceandloadbalancing.
Usingthismechanism,itispossibletoaddredundancytotheManagementNetworktodecreasethechancesofanisolationevent.Thisis,ofcourse,alsopossibleforother“Portgroups”butthatisnotthetopicofthischapterorbook.AnotheroptionisconfiguringanadditionalManagementNetworkbyenablingthe“managementnetwork”tickboxonanotherVMkernelport.AlittleunderstoodfactisthatiftherearemultipleVMkernelnetworksonthesamesubnet,HAwilluseallofthemformanagementtraffic,evenifonlyoneisspecifiedformanagementtraffic!
Althoughtherearemanyconfigurationspossibleandsupported,werecommendasimplebuthighlyresilientconfiguration.WehaveincludedthevMotion(VMkernel)networkinourexampleascombiningtheManagementNetworkandthevMotionnetworkonasinglevSwitchisthemostcommonlyusedconfigurationandanindustryacceptedbestpractice.
Requirements:
2physicalNICsVLANtrunking
Recommended:
2physicalswitchesIfavailable,enable“linkstatetracking”toensurelinkfailuresarereported
ThevSwitchshouldbeconfiguredasfollows:
vSphere6.xHADeepdive
81AddingresiliencytoHA
vSwitch0:2PhysicalNICs(vmnic0andvmnic1).2Portgroups(ManagementNetworkandvMotionVMkernel).ManagementNetworkactiveonvmnic0andstandbyonvmnic1.vMotionVMkernelactiveonvmnic1andstandbyonvmnic0.FailbacksettoNo.
EachportgrouphasaVLANIDassignedandrunsdedicatedonitsownphysicalNIC;onlyinthecaseofafailureitisswitchedovertothestandbyNIC.Wehighlyrecommendsettingfailbackto“No”toavoidchancesofanunwantedisolationevent,whichcanoccurwhenaphysicalswitchroutesnotrafficduringbootbuttheportsarereportedas“up”.(NICTeamingTab)
Pros:Only2NICsintotalareneededfortheManagementNetworkandvMotionVMkernel,especiallyusefulinbladeserverenvironments.Easytoconfigure.
Cons:Justasingleactivepathforheartbeats.
Thefollowingdiagramdepictsthisactive/standbyscenario:
vSphere6.xHADeepdive
82AddingresiliencytoHA
Figure36-Active-StandbyManagementNetworkdesign
Toincreaseresiliency,wealsorecommendimplementingthefollowingadvancedsettingsandusingNICportsondifferentPCIbusses–preferablyNICsofadifferentmakeandmodel.Whenusingadifferentmakeandmodel,evenadriverfailurecouldbemitigated.
AdvancedSettings:das.isolationaddressX=<ip-address>
Theisolationaddresssettingisdiscussedinmoredetailinthesectiontitled"FundamentalConcepts".Inshort;itistheIPaddressthattheHAagentpingstoidentifyifthehostiscompletelyisolatedfromthenetworkorjustnotreceivinganyheartbeats.IfmultipleVMkernelnetworksondifferentsubnetsareused,itisrecommendedtosetanisolationaddresspernetworktoensurethateachofthesewillbeabletovalidateisolationofthehost.
Basicdesignprinciple:TakeadvantageofsomeofthebasicfeaturesvSpherehastoofferlikeNICteaming.CombiningdifferentphysicalNICswillincreaseoverallresiliencyofyoursolution.
vSphere6.xHADeepdive
83AddingresiliencytoHA
CornerCaseScenario:Split-BrainAsplitbrainscenarioisascenariowhereasinglevirtualmachineispoweredupmultipletimes,typicallyontwodifferenthosts.Thisispossibleinthescenariowheretheisolationresponseissetto“leavepoweredon”andnetworkbasedstorage,likeNFS/iSCSIandevenVirtualSAN,isused.Thissituationcanoccurduringafullnetworkisolation,whichmayresultinthelockonthevirtualmachine’sVMDKbeinglost,enablingHAtoactuallypowerupthevirtualmachine.Asthevirtualmachinewasnotpoweredoffonitsoriginalhost(isolationresponsesetto“leavepoweredon”),itwillexistinmemoryontheisolatedhostandinmemorywithadisklockonthehostthatwasrequestedtorestartthevirtualmachine.
Keepinmindthatthistrulyisacornercasescenariowhichisveryunlikelytooccurinmostenvironments.Incaseitdoeshappen,HAreliesonthe“lostlockdetection”mechanismtomitigatethisscenario.InshortESXidetectsthatthelockontheVMDKhasbeenlostand,whenthedatastorebecomesaccessibleagainandthelockcannotbereacquired,issuesaquestionwhetherthevirtualmachineshouldbepoweredoff;HAautomaticallyanswersthequestionwithYes.However,youwillonlyseethisquestionifyoudirectlyconnecttotheESXihostduringthefailure.HAwillgenerateaneventforthisauto-answeredquestionthough.
Asstatedabovethequestionwillbeauto-answeredandthevirtualmachinewillbepoweredofftorecoverfromthesplitbrainscenario.Thequestionstillremains:inthecaseofanisolationwithiSCSIorNFS,shouldyoupoweroffvirtualmachinesorleavethempoweredon?
Asjustexplained,HAwillautomaticallypoweroffyouroriginalvirtualmachinewhenitdetectsasplit-brainscenario.Thisprocesshoweverisnotinstantaneousandassuchitisrecommendedtousetheisolationresponseof“PowerOff”or“Leavepoweredon.Wealsorecommendincreasingheartbeatnetworkresiliencytoavoidgettingintothissituation.WewilldiscusstheoptionsyouhaveforenhancingManagementNetworkresiliencyinthenextchapter.
LinkStateTrackingThiswasalreadybrieflymentionedinthelistofrecommendations,butthisfeatureissomethingwewouldliketoemphasize.Wehavenoticedthatpeopleoftenforgetaboutthiseventhoughmanyswitchesofferthiscapability,especiallyinbladeserverenvironments.
Linkstatetrackingwillmirrorthestateofanupstreamlinktoadownstreamlink.Let’sclarifythatwithadiagram.
vSphere6.xHADeepdive
84AddingresiliencytoHA
Figure37-LinkStatetrackingmechanism
Thediagramabovedepictsascenariowhereanuplinkofa“CoreSwitch”hasfailed.WithoutLinkStateTracking,theconnectionfromthe“EdgeSwitch”tovmnic0willbereportedasup.WithLinkStateTrackingenabled,thestateofthelinkonthe“EdgeSwitch”willreflectthestateofthelinkofthe“CoreSwitch”andassuchbemarkedas“down”.Youmightwonderwhythisisimportantbutthinkaboutitforasecond.ManyfeaturesthatvSphereofferrelyonnetworkingandsodoyourvirtualmachines.Inthecasewherethestateisnotreflected,somefunctionalitymightjustfail,forinstancenetworkheartbeatingcouldfailifitneedstoflowthroughthecoreswitch.Wecallthisa‘blackhole’scenario:thehostsendstrafficdownapaththatitbelievesisup,butthetrafficneverreachesitsdestinationduetothefailedupstreamlink.
Basicdesignprinciple:Knowyournetworkenvironment,talktothenetworkadministratorsandensureadvancedfeatureslikeLinkStateTrackingareusedwhenpossibletoincreaseresiliency.
vSphere6.xHADeepdive
85AddingresiliencytoHA
AdmissionControlAdmissionControlismorethanlikelythemostmisunderstoodconceptvSphereholdstodayandbecauseofthisitisoftendisabled.However,AdmissionControlisamustwhenavailabilityneedstobeguaranteedandisn’tthatthereasonforenablingHAinthefirstplace?
WhatisHAAdmissionControlabout?WhydoesHAcontainthisconceptcalledAdmissionControl?The“AvailabilityGuide”a.k.aHAbiblestatesthefollowing:
vCenterServerusesadmissioncontroltoensurethatsufficientresourcesareavailableinaclustertoprovidefailoverprotectionandtoensurethatvirtualmachineresourcereservationsarerespected.
Pleasereadthatquoteagainandespeciallythefirsttwowords.IndeeditisvCenterthatisresponsibleforAdmissionControl,contrarytowhatmanybelieve.AlthoughthismightseemlikeatrivialfactitisimportanttounderstandthatthisimpliesthatAdmissionControlwillnotdisallowHAinitiatedrestarts.HAinitiatedrestartsaredoneonahostlevelandnotthroughvCenter.
Assaid,AdmissionControlguaranteesthatcapacityisavailableforanHAinitiatedfailoverbyreservingresourceswithinacluster.Itcalculatesthecapacityrequiredforafailoverbasedonavailableresources.Inotherwords,ifahostisplacedintomaintenancemodeordisconnected,itistakenoutoftheequation.Thisalsoimpliesthatifahosthasfailedorisnotrespondingbuthasnotbeenremovedfromthecluster,itisstillincludedintheequation.“AvailableResources”indicatesthatthevirtualizationoverheadhasalreadybeensubtractedfromthetotalamount.
Togiveanexample;VMkernelmemoryissubtractedfromthetotalamountofmemorytoobtainthememoryavailablememoryforvirtualmachines.ThereisonegotchawithAdmissionControlthatwewanttobringtoyourattentionbeforedrillingintothedifferentpolicies.WhenAdmissionControlisenabled,HAwillinnowayviolateavailabilityconstraints.Thismeansthatitwillalwaysensuremultiplehostsareupandrunningandthisappliesformanualmaintenancemodeactionsand,forinstance,toVMwareDistributedPowerManagement.So,ifahostisstucktryingtoenterMaintenanceMode,rememberthatitmightbeHAwhichisnotallowingMaintenanceModetoproceedasitwouldviolatetheAdmissionControlPolicy.Inthissituation,userscanmanuallyvMotionvirtualmachinesoffthehostortemporarilydisableadmissioncontroltoallowtheoperationtoproceed.
vSphere6.xHADeepdive
86AdmissionControl
ButwhatifyouusesomethinglikeDistributedPowerManagement(DPM),wouldthatplaceallhostsinstandbymodetoreducepowerconsumption?No,DPMissmartenoughtotakehostsoutofstandbymodetoensureenoughresourcesareavailabletoprovideforHAinitiatedfailovers.Ifbyanychancetheresourcesarenotavailable,HAwillwaitfortheseresourcestobemadeavailablebyDPMandthenattempttherestartofthevirtualmachines.Inotherwords,theretrycount(5retriesbydefault)isnotwastedinscenarioslikethese.
AdmissionControlPolicyTheAdmissionControlPolicydictatesthemechanismthatHAusestoguaranteeenoughresourcesareavailableforanHAinitiatedfailover.ThissectiongivesageneraloverviewoftheavailableAdmissionControlPolicies.Theimpactofeachpolicyisdescribedinthefollowingsection,includingourrecommendation.HAhasthreemechanismstoguaranteeenoughcapacityisavailabletorespectvirtualmachineresourcereservations.
Figure38-Admissioncontrolpolicy
vSphere6.xHADeepdive
87AdmissionControl
BelowwehavelistedallthreeoptionscurrentlyavailableastheAdmissionControlPolicy.Eachoptionhasadifferentmechanismtoensureresourcesareavailableforafailoverandeachoptionhasitscaveats.
AdmissionControlMechanismsEachAdmissionControlPolicyhasitsownAdmissionControlmechanism.UnderstandingeachoftheseAdmissionControlmechanismsisimportanttoappreciatetheimpacteachonehasonyourclusterdesign.Forinstance,settingareservationonaspecificvirtualmachinecanhaveanimpactontheachievedconsolidationratio.ThissectionwilltakeyouonajourneythroughthetrenchesofAdmissionControlPoliciesandtheirrespectivemechanismsandalgorithms.
HostFailuresClusterTolerates
TheAdmissionControlPolicythathasbeenaroundthelongestisthe“HostFailuresClusterTolerates”policy.ItisalsohistoricallytheleastunderstoodAdmissionControlPolicyduetoitscomplexadmissioncontrolmechanism.
ThisadmissioncontrolpolicycanbeconfiguredinanN-1fashion.Thismeansthatthenumberofhostfailuresyoucanspecifyina32hostclusteris31.
WithinthevSphereWebClientitispossibletomanuallyspecifytheslotsizeascanbeseeninthebelowscreenshot.ThevSphereWebClientalsoallowsyoutoviewwhichvirtualmachinesspanmultipleslots.Thiscanbeveryusefulinscenarioswheretheslotsizehasbeenexplicitlyspecified,wewillexplainwhyinjustasecond.
vSphere6.xHADeepdive
88AdmissionControl
Figure39-HostFailures
Theso-called“slots”mechanismisusedwhenthe“Hostfailuresclustertolerates”hasbeenselectedastheAdmissionControlPolicy.Thedetailsofthismechanismhavechangedseveraltimesinthepastanditisoneofthemostrestrictivepolicies;morethanlikely,itisalsotheleastunderstood.
SlotsdictatehowmanyvirtualmachinescanbepoweredonbeforevCenterstartsyelling“OutOfResources!”Normally,aslotrepresentsonevirtualmachine.AdmissionControldoesnotlimitHAinrestartingvirtualmachines,itensuresenoughunfragmentedresourcesareavailabletopoweronallvirtualmachinesintheclusterbypreventing“over-commitment”.Technicallyspeaking“over-commitment”isnotthecorrectterminologyasAdmissionControlensuresvirtualmachinereservationscanbesatisfiedandthatallvirtualmachines’initialmemoryoverheadrequirementsaremet.Althoughwehavealreadytouchedonthis,itdoesn’thurtrepeatingitasitisoneofthosemythsthatkeepscomingback;HAinitiatedfailoversarenotpronetotheAdmissionControlPolicy.AdmissionControlisdonebyvCenter.HAinitiatedrestarts,inanormalscenario,areexecuteddirectlyontheESXihostwithouttheuseofvCenter.Thecorner-caseiswhereHArequestsDRS(DRSisavCentertask!)todefragmentresourcesbutthatisbesidethepoint.EvenifresourcesarelowandvCenterwouldcomplain,itcouldn’tstoptherestartfromhappening.
Let’sdigintothisconceptwehavejustintroduced,slots.
AslotisdefinedasalogicalrepresentationofthememoryandCPUresourcesthatsatisfythereservationrequirementsforanypowered-onvirtualmachineinthecluster.
InotherwordsaslotistheworstcaseCPUandmemoryreservationscenarioinacluster.Thisdirectlyleadstothefirst“gotcha.”
vSphere6.xHADeepdive
89AdmissionControl
HAusesthehighestCPUreservationofanygivenpowered-onvirtualmachineandthehighestmemoryreservationofanygivenpowered-onvirtualmachineinthecluster.Ifnoreservationofhigherthan32MHzisset,HAwilluseadefaultof32MHzforCPU.Ifnomemoryreservationisset,HAwilluseadefaultof0MB+memoryoverheadformemory.(SeetheVMwarevSphereResourceManagementGuideformoredetailsonmemoryoverheadpervirtualmachineconfiguration.)Thefollowingexamplewillclarifywhat“worst-case”actuallymeans.
Example:Ifvirtualmachine“VM1”has2GHzofCPUreservedand1024MBofmemoryreservedandvirtualmachine“VM2”has1GHzofCPUreservedand2048MBofmemoryreservedtheslotsizeformemorywillbe2048MB(+itsmemoryoverhead)andtheslotsizeforCPUwillbe2GHz.Itisacombinationofthehighestreservationofbothvirtualmachinesthatleadstothetotalslotsize.ReservationsdefinedattheResourcePoollevelhowever,willnotaffectHAslotsizecalculations.
Basicdesignprinciple:Bereallycarefulwithreservations,ifthere’snoneedtohavethemonapervirtualmachinebasis;don’tconfigurethem,especiallywhenusinghostfailuresclustertolerates.Ifreservationsareneeded,resorttoresourcepoolbasedreservations.
Nowthatweknowtheworst-casescenarioisalwaystakenintoaccountwhenitcomestoslotsizecalculations,wewilldescribewhatdictatestheamountofavailableslotsperclusterasthatultimatelydictateshowmanyvirtualmachinescanbepoweredoninyourcluster.
First,wewillneedtoknowtheslotsizeformemoryandCPU,nextwewilldividethetotalavailableCPUresourcesofahostbytheCPUslotsizeandthetotalavailablememoryresourcesofahostbythememoryslotsize.ThisleavesuswithatotalnumberofslotsforbothmemoryandCPUforahost.Themostrestrictivenumber(worst-casescenario)isthenumberofslotsforthishost.Inotherwords,whenyouhave25CPUslotsbutonly5memoryslots,theamountofavailableslotsforthishostwillbe5asHAalwaystakestheworstcasescenariointoaccountto“guarantee”allvirtualmachinescanbepoweredonincaseofafailureorisolation.
ThequestionwereceivealotishowdoIknowwhatmyslotsizeis?ThedetailsaroundslotsizescanbemonitoredontheHAsectionoftheCluster’sMonitortabbycheckingthethe“AdvancedRuntimeInfo”sectionwhenthe“HostFailures”AdmissionControlPolicyisconfigured.
vSphere6.xHADeepdive
90AdmissionControl
Figure40-HighAvailabilityclustermonitorsection
AdvancedRuntimeInfowillshowthespecificstheslotsizeandmoreusefuldetailssuchasthenumberofslotsavailableasdepictedinFigure30.
vSphere6.xHADeepdive
91AdmissionControl
Figure41-HighAvailabilityadvancedruntimeinfo
Asyoucanimagine,usingreservationsonapervirtualmachinebasiscanleadtoveryconservativeconsolidationratios.However,thisissomethingthatisconfigurablethroughtheWebClient.Ifyouhavejustonevirtualmachinewithareallyhighreservation,youcansetanexplicitslotsizebygoingto“EditClusterServices”andspecifyingthemundertheAdmissionControlPolicysectionasshowninFigure29.
Ifoneoftheseadvancedsettingsisused,HAwillensurethatthevirtualmachinethatskewedthenumberscanberestartedby“assigning”multipleslotstoit.However,whenyouarelowonresources,thiscouldmeanthatyouarenotabletopoweronthevirtualmachinewiththisreservationbecauseresourcesmaybefragmentedthroughouttheclusterinsteadofavailableonasinglehost.HAwillnotifyDRSthatapower-onattemptwasunsuccessfulandarequestwillbemadetodefragmenttheresourcestoaccommodatetheremainingvirtualmachinesthatneedtobepoweredon.InorderforthistobesuccessfulDRSwillneedtobeenabledandconfiguredtofullyautomated.WhennotconfiguredtofullyautomateduseractionisrequiredtoexecuteDRSrecommendations.
vSphere6.xHADeepdive
92AdmissionControl
Thefollowingdiagramdepictsascenariowhereavirtualmachinespansmultipleslots:
Figure42-VirtualmachinespanningmultipleHAslots
Noticethatbecausethememoryslotsizehasbeenmanuallysetto1024MB,oneofthevirtualmachines(groupedwithdottedlines)spansmultipleslotsduetoa4GBmemoryreservation.Asyoumighthavenoticed,noneofthehostshasenoughresourcesavailabletosatisfythereservationofthevirtualmachinethatneedstofailover.Althoughintotalthereareenoughresourcesavailable,theyarefragmentedandHAwillnotbeabletopower-onthisparticularvirtualmachinedirectlybutwillrequestDRStodefragmenttheresourcestoaccommodatethisvirtualmachine’sresourcerequirements.
AdmissionControldoesnottakefragmentationofslotsintoaccountwhenslotsizesaremanuallydefinedwithadvancedsettings.Itwilltakethenumberofslotsthisvirtualmachinewillconsumeintoaccountbysubtractingthemfromthetotalnumberofavailableslots,butitwillnotverifytheamountofavailableslotsperhosttoensurefailover.Asstatedearlier,though,HAwillrequestDRStodefragmenttheresources.Thisisbynomeansaguaranteeofasuccessfulpower-onattempt.
vSphere6.xHADeepdive
93AdmissionControl
Basicdesignprinciple:Avoidusingadvancedsettingstodecreasetheslotsizeasitcouldleadtomoredowntimeandaddsanextralayerofcomplexity.Ifthereisalargediscrepancyinsizeandreservationswerecommendusingthepercentagebasedadmissioncontrolpolicy.
WithinthevSphereWebClientthereisfunctionalitywhichenablesyoutoidentifyvirtualmachineswhichspanmultipleslots,asshowninFigure29.Wehighlyrecommendmonitoringthissectiononaregularbasistogetabetterunderstandofyourenvironmentandtoidentifythosevirtualmachinesthatmightbeproblematictorestartincaseofahostfailure.
UnbalancedConfigurationsandImpactonSlotCalculation
Itisanindustrybestpracticetocreateclusterswithsimilarhardwareconfigurations.However,manycompaniesstartedoutwithasmallVMwareclusterwhenvirtualizationwasfirstintroduced.Whenthetimehascometoexpand,chancesarefairlylargethesamehardwareconfigurationisnolongeravailable.Thequestioniswillyouaddthenewlyboughthoststothesameclusterorcreateanewcluster?
FromaDRSperspective,largeclustersarepreferredasitincreasestheloadbalancingopportunities.HoweverthereisacaveatforDRSaswell,whichisdescribedintheDRSsectionofthisbook.ForHA,thereisabigcaveat.WhenyouthinkaboutitandunderstandtheinternalworkingsofHA,morespecificallytheslotalgorithm,youprobablyalreadyknowwhatiscomingup.
Let’sfirstdefinetheterm“unbalancedcluster.”
Anunbalancedclusterwould,forinstance,beaclusterwith3hostsofwhichonecontainssubstantiallymorememorythantheotherhostsinthecluster.
Let’strytoclarifythatwithanexample.
Example:Whatwouldhappentothetotalnumberofslotsinaclusterofthefollowingspecifications?
ThreehostclusterTwohostshave16GBofavailablememoryOnehosthas32GBofavailablememory
Thethirdhostisabrandnewhostthathasjustbeenboughtandaspricesofmemorydroppedimmenselythedecisionwasmadetobuy32GBinsteadof16GB.
Theclustercontainsavirtualmachinethathas1vCPUand4GBofmemory.A1024MBmemoryreservationhasbeendefinedonthisvirtualmachine.Asexplainedearlier,areservationwilldictatetheslotsize,whichinthiscaseleadstoamemoryslotsizeof1024
vSphere6.xHADeepdive
94AdmissionControl
MB+memoryoverhead.Forthesakeofsimplicity,wewillcalculatewith1024MB.Thefollowingdiagramdepictsthisscenario:
Figure43-HighAvailabilitymemoryslotsize
WhenAdmissionControlisenabledandthenumberofhostfailureshasbeenselectedastheAdmissionControlPolicy,thenumberofslotswillbecalculatedperhostandtheclusterintotal.Thiswillresultin:
Host Numberofslots
ESXi-01 16Slots
ESXi-02 16Slots
ESXi-03 32Slots
AsAdmissionControlisenabled,aworst-casescenarioistakenintoaccount.Whenasinglehostfailurehasbeenspecified,thismeansthatthehostwiththelargestnumberofslotswillbetakenoutoftheequation.Inotherwords,forourcluster,thiswouldresultin:
ESXi-01+ESXi-02=32slotsavailable
vSphere6.xHADeepdive
95AdmissionControl
Althoughyouhavedoubledtheamountofmemoryinoneofyourhosts,youarestillstuckwithonly32slotsintotal.Asclearlydemonstrated,thereisabsolutelynopointinbuyingadditionalmemoryforasinglehostwhenyourclusterisdesignedwithAdmissionControlenabledandthenumberofhostfailureshasbeenselectedastheAdmissionControlPolicy.
Inourexample,thememoryslotsizehappenedtobethemostrestrictive;however,thesameprincipleapplieswhenCPUslotsizeismostrestrictive.
Basicdesignprinciple:Whenusingadmissioncontrol,balanceyourclustersandbeconservativewithreservationsasitleadstodecreasedconsolidationratios.
Now,whatwouldhappeninthescenarioabovewhenthenumberofallowedhostfailuresisto2?InthiscaseESXi-03istakenoutoftheequationandoneofanyoftheremaininghostsintheclusterisalsotakenout,resultingin16slots.Thismakessense,doesn’tit?
CanyouavoidlargeHAslotsizesduetoreservationswithoutresortingtoadvancedsettings?That’sthequestionwegetalmostdailyandtheansweristhe“PercentageofClusterResourcesReserved”admissioncontrolmechanism.
PercentageofClusterResourcesReserved
ThePercentageofClusterResourcesReservedadmissioncontrolpolicyisoneofthemostusedadmissioncontrolpolicies.Thesimplereasonforthisisthatitistheleastrestrictiveandmostflexible.Itisalsoveryeasytoconfigureasshowninthescreenshotbelow.
Figure44-SettingadifferentpercentageforCPU/Memory
ThemainadvantageofthepercentagebasedAdmissionControlPolicyisthatitavoidsthecommonlyexperiencedslotsizeissuewherevaluesareskewedduetoalargereservation.Butifitdoesn’tusetheslotalgorithm,whatdoesituse?
Whenyouspecifyapercentage,andlet’sassumefornowthatthepercentageforCPUandmemorywillbeconfiguredequally,thatpercentageofthetotalamountofavailableresourceswillstayreservedforHApurposes.Firstofall,HAwilladdupallavailable
vSphere6.xHADeepdive
96AdmissionControl
resourcestoseehowmuchithasavailable(virtualizationoverheadwillbesubtracted)intotal.Then,HAwillcalculatehowmuchresourcesarecurrentlyreservedbyaddingupallreservationsformemoryandforCPUforallpoweredonvirtualmachines.
Forthosevirtualmachinesthatdonothaveareservation,adefaultof32MHzwillbeusedforCPUandadefaultof0MB+memoryoverheadwillbeusedforMemory.(Amountofoverheadperconfigurationtypecanbefoundinthe“UnderstandingMemoryOverhead”sectionoftheResourceManagementguide.)
Inotherwords:
((totalamountofavailableresources–totalreservedvirtualmachineresources)/totalamountofavailableresources)<=(percentageHAshouldreserveassparecapacity)
Totalreservedvirtualmachineresourcesincludesthedefaultreservationof32MHzandthememoryoverheadofthevirtualmachine.
Let’suseadiagramtomakeitabitclearer:
Figure45-Percentageofclusterresourcesreserved
Totalclusterresourcesare24GHz(CPU)and96GB(MEM).Thiswouldleadtothefollowingcalculations:
((24GHz-(2GHz+1GHz+32MHz+4GHz))/24GHz)=69%available
vSphere6.xHADeepdive
97AdmissionControl
((96GB-(1,1GB+114MB+626MB+3,2GB)/96GB=85%available
Asyoucansee,theamountofmemorydiffersfromthediagram.Evenifareservationhasbeenset,theamountofmemoryoverheadisaddedtothereservation.ThisexamplealsodemonstrateshowkeepingCPUandmemorypercentageequalcouldcreateanimbalance.Ideally,ofcourse,thehostsareprovisionedinsuchawaythatthereisnoCPU/memoryimbalance.Experienceovertheyearshasproven,unfortunately,thatmostenvironmentsrunoutofmemoryresourcesfirstandthismightneedtobefactoredinwhencalculatingthecorrectvalueforthepercentage.However,thistrendmightbechangingasmemoryisgettingcheapereveryday.
Inordertoensurevirtualmachinescanalwaysberestarted,AdmissionControlwillconstantlymonitorifthepolicyhasbeenviolatedornot.PleasenotethatthisAdmissionControlprocessispartofvCenterandnotoftheESXihost!Whenoneofthethresholdsisreached,memoryorCPU,AdmissionControlwilldisallowpoweringonanyadditionalvirtualmachinesasthatcouldpotentiallyimpactavailability.ThesethresholdscanbemonitoredontheHAsectionoftheCluster’ssummarytab.
Figure46-HighAvailabilitysummary
Ifyouhaveanunbalancedcluster(hostswithdifferentsizesofCPUormemoryresources),yourpercentageshouldbeequalorpreferablylargerthanthepercentageofresourcesprovidedbythelargesthost.Thiswayyouensurethatallvirtualmachinesresidingonthishostcanberestartedincaseofahostfailure.
Asearlierexplained,thisAdmissionControlPolicydoesnotuseslots.Assuch,resourcesmightbefragmentedthroughoutthecluster.AlthoughDRSisnotifiedtorebalancethecluster,ifneeded,toaccommodatethesevirtualmachinesresourcerequirements,a
vSphere6.xHADeepdive
98AdmissionControl
guaranteecannotbegiven.Werecommendselectingthehighestrestartpriorityforthisvirtualmachine(ofcourse,dependingontheSLA)toensureitwillbeabletoboot.
Thefollowingexampleanddiagram(Figure37)willmakeitmoreobvious:Youhave3hosts,eachwithroughly80%memoryusage,andyouhaveconfiguredHAtoreserve20%ofresourcesforbothCPUandmemory.Ahostfailsandallvirtualmachineswillneedtofailover.Oneofthosevirtualmachineshasa4GBmemoryreservation.Asyoucanimagine,HAwillnotbeabletoinitiateapower-onattempt,astherearenotenoughmemoryresourcesavailabletoguaranteethereservedcapacity.Insteadaneventwillgetgeneratedindicating"notenoughresourcesforfailover"forthisvirtualmachine.
Figure47-Availableresources
Basicdesignprinciple:AlthoughHAwillutilizeDRStotrytoaccommodatefortheresourcerequirementsofthisvirtualmachineaguaranteecannotbegiven.Dothemath;verifythatanysinglehosthasenoughresourcestopower-onyourlargestvirtualmachine.Alsotakerestartpriorityintoaccountforthis/thesevirtualmachine(s).
FailoverHosts
ThethirdoptiononecouldchooseistoselectoneormultipledesignatedFailoverhosts.Thisiscommonlyreferredtoasahotstandby.
vSphere6.xHADeepdive
99AdmissionControl
Figure48-SelectfailoverhostsAdmissionControlPolicy
Itis“whatyouseeiswhatyouget”.Whenyoudesignatehostsasfailoverhosts,theywillnotparticipateinDRSandyouwillnotbeabletorunvirtualmachinesonthesehosts!Thesehostsareliterallyreservedforfailoversituations.HAwillattempttousethesehostsfirsttofailoverthevirtualmachines.If,forwhateverreason,thisisunsuccessful,itwillattemptafailoveronanyoftheotherhosts.Forexample,whenthreehostswouldfail,includingthehostsdesignatedasfailoverhosts,HAwillstilltrytorestarttheimpactedvirtualmachinesonthehostthatisleft.Althoughthishostwasnotadesignatedfailoverhost,HAwilluseittolimitdowntime.
vSphere6.xHADeepdive
100AdmissionControl
Figure49-Selectmultiplefailoverhosts
DecisionMakingTimeAswithanydecisionyoumake,thereisanimpacttoyourenvironment.Thisimpactcouldbepositivebutalso,forinstance,unexpected.ThisespeciallygoesforHAAdmissionControl.SelectingtherightAdmissionControlPolicycanleadtoaquickerReturnOnInvestmentandalowerTotalCostofOwnership.Intheprevioussection,wedescribedallthealgorithmsandmechanismsthatformAdmissionControlandinthissectionwewillfocusmoreonthedesignconsiderationsaroundselectingtheappropriateAdmissionControlPolicyforyouroryourcustomer’senvironment.
ThefirstdecisionthatwillneedtobemadeiswhetherAdmissionControlwillbeenabled.WegenerallyrecommendenablingAdmissionControlasitistheonlywayofguaranteeingyourvirtualmachineswillbeallowedtorestartafterafailure.Itisimportant,though,thatthepolicyiscarefullyselectedandfitsyouroryourcustomer’srequirements.
Basicdesignprinciple
Admissioncontrolguaranteesenoughcapacityisavailableforvirtualmachinefailover.Assuchwerecommendenablingit.
vSphere6.xHADeepdive
101AdmissionControl
Althoughwealreadyhaveexplainedallthemechanismsthatarebeingusedbyeachofthepoliciesintheprevioussection,wewillgiveahighleveloverviewandlistalltheprosandconsinthissection.Ontopofthat,wewillexpandonwhatwefeelisthemostflexibleAdmissionControlPolicyandhowitshouldbeconfiguredandcalculated.
HostFailuresClusterTolerates
ThisoptionishistoricallyspeakingthemostusedforAdmissionControl.MostenvironmentsaredesignedwithanN+1redundancyandN+2isalsonotuncommon.ThisAdmissionControlPolicyuses“slots”toensureenoughcapacityisreservedforfailover,whichisafairlycomplexmechanism.SlotsarebasedonVM-levelreservationsandifreservationsarenotusedadefaultslotsizeforCPUof32MHzisdefinedandformemorythelargestmemoryoverheadofanygivenvirtualmachineisused.
Pros:
Fullyautomated(Whenahostisaddedtoacluster,HAre-calculateshowmanyslotsareavailable.)Guaranteesfailoverbycalculatingslotsizes.
Cons:
Canbeveryconservativeandinflexiblewhenreservationsareusedasthelargestreservationdictatesslotsizes.Unbalancedclustersleadtowastageofresources.Complexityforadministratorfromcalculationperspective.
PercentageasClusterResourcesReserved
ThepercentagebasedAdmissionControlisbasedonper-reservationcalculationinsteadoftheslotsmechanism.ThepercentagebasedAdmissionControlPolicyislessconservativethan“HostFailures”andmoreflexiblethan“FailoverHosts”.
Pros:
Accurateasitconsidersactualreservationpervirtualmachinetocalculateavailablefailoverresources.Clusterdynamicallyadjustswhenresourcesareadded.
Cons:
Manualcalculationsneededwhenaddingadditionalhostsinaclusterandnumberofhostfailuresneedstoremainunchanged.Unbalancedclusterscanbeaproblemwhenchosenpercentageistoolowand
vSphere6.xHADeepdive
102AdmissionControl
resourcesarefragmented,whichmeansfailoverofavirtualmachinecan’tbeguaranteedasthereservationofthisvirtualmachinemightnotbeavailableasablockofresourcesonasinglehost.
Pleasenotethat,althoughafailovercannotbeguaranteed,therearefewscenarioswhereavirtualmachinewillnotbeabletorestartduetotheintegrationHAofferswithDRSandthefactthatmostclustershavesparecapacityavailabletoaccountforvirtualmachinedemandvariance.Althoughthisisacorner-casescenario,itneedstobeconsideredinenvironmentswhereabsoluteguaranteesmustbeprovided.
SpecifyFailoverHosts
Withthe“SpecifyFailoverHosts”AdmissionControlPolicy,whenoneormultiplehostsfail,HAwillattempttorestartallvirtualmachinesonthedesignatedfailoverhosts.Thedesignatedfailoverhostsareessentially“hotstandby”hosts.Inotherwords,DRSwillnotmigratevirtualmachinestothesehostswhenresourcesarescarceortheclusterisimbalanced.
Pros:
Whatyouseeiswhatyouget.Nofragmentedresources.
Cons:
Whatyouseeiswhatyouget.Dedicatedfailoverhostsnotutilizedduringnormaloperations.
RecommendationsWehavebeenaskedmanytimesforourrecommendationonAdmissionControlanditisdifficulttoansweraseachpolicyhasitsprosandcons.However,wegenerallyrecommendaPercentagebasedAdmissionControlPolicy.Itisthemostflexiblepolicyasitusestheactualreservationpervirtualmachineinsteadoftakinga“worstcase”scenarioapproachlikethenumberofhostfailuresdoes.However,thenumberofhostfailurespolicyguaranteesthefailoverlevelunderallcircumstances.Percentagebasedislessrestrictive,butofferslowerguaranteesthatinallscenariosHAwillbeabletorestartallvirtualmachines.WiththeaddedlevelofintegrationbetweenHAandDRSwebelieveaPercentagebasedAdmissionControlPolicywillfitmostenvironments.
vSphere6.xHADeepdive
103AdmissionControl
Basicdesignprinciple:Dothemath,andtakecustomerrequirementsintoaccount.Werecommendusinga“percentage”basedadmissioncontrolpolicy,asitisthemostflexible.
NowthatwehaverecommendedwhichAdmissionControlPolicytouse,thenextstepistoprovideguidancearoundselectingthecorrectpercentage.Wecannottellyouwhattheidealpercentageisasthattotallydependsonthesizeofyourclusterand,ofcourse,onyourresiliencymodel(N+1vs.N+2).Wecan,however,provideguidelinesaroundcalculatinghowmuchofyourresourcesshouldbesetasideandhowtopreventwastingresources.
SelectingtheRightPercentageItisacommonstrategytoselectasinglehostasapercentageofresourcesreservedforfailover.Wegeneårallyrecommendselectingapercentagewhichistheequivalentofasingleormultiplehosts,Let’sexplainwhyandwhattheimpactisofnotusingtheequivalentofasingleormultiplehosts.
Let’sstartwithanexample:aclusterexistsof8ESXihosts,eachcontaining70GBofavailableRAM.Thismightsoundlikeanawkwardmemoryconfigurationbuttosimplifythingswehavealreadysubtracted2GBasvirtualizationoverhead.Althoughvirtualizationoverheadisprobablylessthan2GB,wehaveusedthisnumbertomakecalculationseasier.ThisexamplezoomsinonmemorybutthisconceptalsoappliestoCPU,ofcourse.
ForthisclusterwewilldefinethepercentageofresourcestoreserveforbothMemoryandCPUto20%.Formemory,thisleadstoatotalclustermemorycapacityof448GB:
(70GB+70GB+70GB+70GB+70GB+70GB+70GB+70GB)*(1–20%)
Atotalof112GBofmemoryisreservedasfailovercapacity.
Onceapercentageisspecified,thatpercentageofresourceswillbeunavailableforvirtualmachines,thereforeitmakessensetosetthepercentageasclosetothevaluethatequalstheresourcesasingle(ormultiple)hostrepresents.Wewilldemonstratewhythisisimportantinsubsequentexamples.
Intheexampleabove,20%wasusedtobereservedforresourcesinan8-hostcluster.Thisconfigurationreservesmoreresourcesthanasinglehostcontributestothecluster.HA’smainobjectiveistoprovideautomaticrecoveryforvirtualmachinesafteraphysicalserverfailure.Forthisreason,itisrecommendedtoreserveresourcesequaltoasingleormultiplehosts.Whenusingtheper-hostlevelgranularityinan8-hostcluster(homogeneousconfiguredhosts),theresourcecontributionperhosttotheclusteris12.5%.However,the
vSphere6.xHADeepdive
104AdmissionControl
percentageusedmustbeaninteger(wholenumber).Itisrecommendedtorounduptothevalueguaranteeingthatthefullcapacityofonehostisprotected,inthisexample(Figure40),theconservativeapproachwouldleadtoapercentageof13%.
Figure50-Settingthecorrectvalue
AggressiveApproach
Wehaveseenmanyenvironmentswherethepercentagewassettoavaluethatwaslessthanthecontributionofasinglehosttothecluster.Althoughthisapproachreducestheamountofresourcesreservedforaccommodatinghostfailuresandresultsinhigherconsolidationratios,italsooffersalowerguaranteethatHAwillbeabletorestartallvirtualmachinesafterafailure.Onemightarguethatthisapproachwillmorethanlikelyworkasmostenvironmentswillnotbefullyutilized;howeveritalsodoeseliminatetheguaranteethatafterafailureallvirtualmachineswillberecovered.Wasn’tthatthereasonforenablingHAinthefirstplace?
AddingHoststoYourCluster
Althoughthepercentageisdynamicandcalculatescapacityatacluster-level,changestoyourselectedpercentagemightberequiredwhenexpandingthecluster.Thereasonbeingthattheamountofreservedresourcesforafail-overmightnotcorrespondwiththecontributionperhostandasaresultleadtoresourcewastage.Forexample,adding4hoststoan8-hostclusterandcontinuingtousethepreviouslyconfiguredadmissioncontrolpolicyvalueof13%willresultinafailovercapacitythatisequivalentto1.5hosts.Figure41depicts
vSphere6.xHADeepdive
105AdmissionControl
ascenariowherean8-hostclusterisexpandedto12hosts.Eachhostholds82GHzcoresand70GBofmemory.Theclusterwasoriginallyconfiguredwithadmissioncontrolsetto13%,whichequalsto109.2GBand24.96GHz.Iftherequirementistoallowasinglehostfailure7.68Ghzand33.6GBis“wasted”asclearlydemonstratedinthediagrambelow.
Figure51-Avoidwastingresources
HowtoDefineYourPercentage?
AsexplainedearlieritwillfullydependontheN+Xmodelthathasbeenchosen.Basedonthismodel,werecommendselectingapercentagethatequalstheamountofresourcesasinglehostrepresents.So,inthecaseofan8hostclusterandN+2resiliency,thepercentageshouldbesetasfollows:2/8(*100)=25%
Basicdesignprinciple:InordertoavoidwastingresourceswerecommendcarefullyselectingyourN+Xresiliencyarchitecture.Calculatetherequiredpercentagebasedonthisarchitecture.
vSphere6.xHADeepdive
106AdmissionControl
VMandApplicationMonitoringVMandApplicationMonitoringisanoftenoverlookedbutreallypowerfulfeatureofHA.ThereasonforthisismostlikelythatitisdisabledbydefaultandrelativelynewcomparedtoHA.WehavetriedtogatheralltheinformationwecouldaroundVMandApplicationMonitoring,butitisaprettystraightforwardproductthatactuallydoeswhatyouexpectitwoulddo.
Figure52-VMandApplicationMonitoring
WhyDoYouNeedVM/ApplicationMonitoring?VMandApplicationMonitoringactsonadifferentlevelfromHA.VM/AppMonitoringrespondstoasinglevirtualmachineorapplicationfailureasopposedtoHAwhichrespondstoahostfailure.Anexampleofasinglevirtualmachinefailurewould,forinstance,betheinfamous“bluescreenofdeath”.InthecaseofAppMonitoringthetypeoffailurethattriggersaresponseisdefinedbytheapplicationdeveloperoradministrator.
HowDoesVM/AppMonitoringWork?
VMMonitoringresetsindividualvirtualmachineswhenneeded.VM/AppmonitoringusesaheartbeatsimilartoHA.Ifheartbeats,and,inthiscase,VMwareToolsheartbeats,arenotreceivedforaspecific(andconfigurable)amountoftime,thevirtualmachinewillbe
vSphere6.xHADeepdive
107VMandApplicationMonitoring
restarted.TheseheartbeatsaremonitoredbytheHAagentandarenotsentoveranetwork,butstaylocaltothehost.
Figure53-VMMonitoringsensitivity
WhenenablingVM/AppMonitoring,thelevelofsensitivity(Figure43)canbeconfigured.Thedefaultsettingshouldfitmostsituations.Lowsensitivitybasicallymeansthatthenumberofallowed“missed”heartbeatsishigherandthechancesofrunningintoafalsepositivearelower.However,ifafailureoccursandthesensitivitylevelissettoLow,theexperienceddowntimewillbehigher.Whenquickactionisrequiredintheeventofafailure,“highsensitivity”canbeselected.Asexpected,thisistheoppositeof“lowsensitivity”.Notethattheadvancedsettingsmentionedinthefollowingtablearedeprecatedandlistedforeducationalpurposes.
Sensitivity Failureinterval Maxfailures Maximresetstimewindow
Low 120Seconds 3 7Days
Medium 60Seconds 3 24Hours
High 30Seconds 3 1hour
ItisimportanttorememberthatVMMonitoringdoesnotinfinitelyrebootvirtualmachinesunlessyouspecifyacustompolicywiththisrequirement.Thisistoavoidaproblemfromrepeating.Bydefault,whenavirtualmachinehasbeenrebootedthreetimeswithinanhour,nofurtherattemptswillbetaken.Unlessthespecifiedtimehaselapsed.Thefollowingadvancedsettingscanbesettochangethisdefaultbehavioror“custom”canbeselectedasshowninFigure43.
AlthoughtheheartbeatproducedbyVMwareToolsisreliable,VMwareaddedafurtherverificationmechanism.Toavoidfalsepositives,VMMonitoringalsomonitorsI/Oactivityofthevirtualmachine.WhenheartbeatsarenotreceivedANDnodiskornetworkactivityhas
vSphere6.xHADeepdive
108VMandApplicationMonitoring
occurredoverthelast120seconds,perdefault,thevirtualmachinewillbereset.Changingtheadvancedsetting“das.iostatsInterval”canmodifythis120-secondinterval.
Itisrecommendedtoalignthedas.iostatsIntervalwiththefailureintervalselectedintheVMMonitoringsectionofvSphereHAwithintheWebClientorthevSphereClient.
Basicdesignprinciple:Aligndas.iostatsIntervalwiththefailureinterval.
ScreenshotsOneofthemostusefulfeaturesaspartofVMMonitoringisthefactthatittakesscreenshotsofthevirtualmachine’sconsole.ThescreenshotsaretakenrightbeforeVMMonitoringresetsavirtualmachine.Itisaveryusefulfeaturewhenavirtualmachine“freezes”everyonceinawhilefornoapparentreason.Thisscreenshotcanbeusedtodebugthevirtualmachineoperatingsystemwhenneeded,andisstoredinthevirtualmachine’sworkingdirectoryasloggedintheEventsviewontheMonitortabofthevirtualmachine.
Basicdesignprinciple:VMandApplicationmonitoringcansubstantiallyincreaseavailability.ItispartoftheHAstackandwestronglyrecommendusingit!
VMMonitoringImplementationDetailsVM/AppMonitoringisimplementedaspartoftheHAagentitself.Theagentusesthe“PerformanceManager”tomonitordiskandnetworkI/O;VM/AppMonitoringusesthe“usage”countersforbothdiskandnetworkanditrequeststhesecountersonceenoughheartbeatshavebeenmissedthattheconfiguredpolicyistriggered.
Asstatedbefore,VM/AppMonitoringusesheartbeatsjustlikehost-levelHA.TheheartbeatsaremonitoredbytheHAagent,whichisresponsiblefortherestarts.Ofcourse,thisinformationisalsobeingrolledupintovCenter,butthatisdoneviatheManagementNetwork,notusingthevirtualmachinenetwork.Thisiscrucialtoknowasthismeansthatwhenavirtualmachinenetworkerroroccurs,thevirtualmachineheartbeatwillstillbereceived.Whenanerroroccurs,HAwilltriggerarestartofthevirtualmachinewhenallthreeconditionsaremet:
1. NoVMwareToolsheartbeatreceived2. NonetworkI/Ooverthelast120seconds3. NostorageI/Ooverthelast120seconds
Justlikewithhost-levelHA,theHAagentworksindependentlyofvCenterwhenitcomestovirtualmachinerestarts.
vSphere6.xHADeepdive
109VMandApplicationMonitoring
Timing
TheVM/Appmonitoringfeaturemonitorstheheartbeat(s)issuedbyaguestandresetsthevirtualmachineifthereisaheartbeatfailurethatsatisfiestheconfiguredpolicyforthevirtualmachine.HAcanmonitorjusttheheartbeatsissuedbytheVMwaretoolsprocessorcanmonitortheseheartbeatsplusthoseissuedbyanoptionalin-guestagent.
IftheVMmonitoringheartbeatsstopattimeT-0,theminimumtimebeforeHAwilldeclareaheartbeatfailureisintherangeof81secondsto119seconds,whereasforheartbeatsissuedbyanin-guestapplicationagent,HAwilldeclareafailureintherangeof61secondsto89seconds.Onceaheartbeatfailureisdeclaredforapplicationheartbeats,HAwillattempttoresetthevirtualmachine.However,forVMwaretoolsheartbeats,HAwillfirstcheckwhetheranyIOhasbeenissuedbythevirtualmachineforthelast2minutes(bydefault)andonlyiftherehasbeennoIOwillitissueareset.DuetohowHOSTDpublishestheI/Ostatistics,thischeckcoulddelaytheresetbyapproximately20secondsforvirtualmachinesthatwereissuingI/Owithinapproximately1minuteofT-0.
Timingdetails:therangedependsonwhentheheartbeatsstoprelativetotheHOSTDthreadthatmonitorsthem.ForthelowerboundoftheVMwaretoolsheartbeats,theheartbeatsstopasecondbeforetheHOSTDthreadruns,whichmeans,atT+31,theFDMagentonthehostwillbenotifiedofatoolsyellowstate,andthenatT+61oftheredstate,whichHAreactsto.HAthenmonitorstheheartbeatfailureforaminimumof30seconds,leadingtotheminofT+91.The30secondsmonitoringperioddonebyHAcanbeincreasedusingthedas.failureIntervalpolicysetting.Fortheupperbound,theFDMisnotnotifieduntilT+59s(T=0thefailureoccurs,T+29HOSTDnoticesitandstartstheheartbeatfailuretimer,andatT+59HOSTDreportsayellowstate,andatT+89reportsaredstate).
Fortheheartbeatsissuedbyanin-guestagent,noyellowstateissent,sothethereisnoadditional30secondsperiod.
ApplicationMonitoringApplicationMonitoringisapartofVMMonitoring.ApplicationMonitoringisafeaturethatpartnersand/orcustomerscanleveragetoincreaseresiliency,asshowninthescreenshotbelowbutfromanapplicationpointofviewratherthanfromaVMpointofview.ThereisanSDKavailabletothegeneralpublicanditispartoftheguestSDK.
vSphere6.xHADeepdive
110VMandApplicationMonitoring
Figure54-VMandApplicationMonitoring
TheGuestSDKiscurrentlyprimarilyusedbyapplicationdevelopersfrompartnerslikeSymantectodevelopsolutionsthatincreaseresilienceonadifferentlevelthanVMMonitoringandHA.InthecaseofSymantec,asimplifiedversionofVeritasClusterServer(VCS)isusedtoenableapplicationavailabilitymonitoring,includingrespondingtoissues.Notethatthisisnotamulti-nodeclusteringsolutionlikeVCSitself,butasinglenodesolution.
SymantecApplicationHA,asitiscalled,istriggeredtogettheapplicationupandrunningagainbyrestartingit.Symantec'sApplicationHAisawareofdependenciesandknowsinwhichorderservicesshouldbestartedorstopped.If,however,thisfailsforacertainnumber(configurableoptionwithinApplicationHA)oftimes,VMwareHAwillberequestedtotakeaction.Thisactionwillbearestartofthevirtualmachine.
AlthoughApplicationMonitoringisrelativelynewandthereareonlyafewpartnerscurrentlyexploringthecapabilities,inouropinion,itdoesaddawholenewlevelofresiliency.Yourin-housedevelopmentteamcouldleveragefunctionalityofferedthroughtheAPI,oryoucoulduseasolutiondevelopedbyoneofVMware’spartners.WehavetestedApplicationHAbySymantecandpersonallyfeelitisthemissinglink.ItenablesyouasSystemAdmintointegrateyourvirtualizationlayerwithyourapplicationlayer.ItensuresyouasaSystemAdminthatserviceswhichareprotectedarerestartedinthecorrectorderanditavoidsthecommonpitfallsassociatedwithrestartsandmaintenance.NotethatVMwarealsointroducedan"ApplicationMonitoring"solutionwhichwasbasedonHyperictechnology,thisproducthoweverhasbeendeprecatedandassuchwillnotbediscussedinthispublication.
ApplicationAwarenessAPI
TheApplicationAwarenessAPIisopenforeveryone.Wefeelthatthisisnottheplacetodoafulldeepdiveonhowtouseit,butwedowanttodiscussitbriefly.
TheApplicationAwarenessAPIallowsforanyonetotalktoit,includingscripts,whichmakesthepossibilitiesendless.Currentlythereare6functionsdefined:
_VMGuestAppMonitor_Enable_()
EnablesMonitoring_VMGuestAppMonitor_MarkActive_()
vSphere6.xHADeepdive
111VMandApplicationMonitoring
Callevery30secondstomarkapplicationasactive_VMGuestAppMonitor_Disable_()
DisableMonitoring_VMGuestAppMonitor_IsEnabled_()
ReturnsstatusofMonitoring_VMGuestAppMonitor_GetAppStatus_()
Returnsthecurrentapplicationstatusrecordedfortheapplication_VMGuestAppMonitor_Free(_)
FreestheresultoftheVMGuestAppMonitor_GetAppStatus()call
Thesefunctionscanbeusedbyyourdevelopmentteam,howeverAppMonitoringalsooffersanewexecutable.ThisallowsyoutousethefunctionalityAppMonitoringofferswithouttheneedtocompileafullbinary.Thisnewcommand,vmware-appmonitoring.exe,takesthefollowingarguments,whicharenotcoincidentallysimilartothefunctions:
EnableDisablemarkActiveisEnabledgetAppStatus
Whenrunningthecommandvmware-appmonitor.exe,whichcanbefoundunder"VMware-GuestAppMonitorSDK\bin\win32\"thefollowingoutputispresented:
Usage:vmware-appmonitor.exe{enable|disable|markActive|isEnabled|getAppStatus}
AsshowntherearemultiplewaysofleveragingApplicationMonitoringandtoenhanceresiliencyonanapplicationlevel.
vSphere6.xHADeepdive
112VMandApplicationMonitoring
vSphereHAand...NowthatyouknowhowHAworksinsideout,wewanttoexplainthedifferentintegrationpointsbetweenHA,DRSandSDRS.
HAandStorageDRSvSphereHAinformsStorageDRSwhenafailurehasoccurred.ThistopreventtherelocationofanyHAprotectedvirtualmachine,meaning,avirtualmachinethatwaspoweredon,butwhichfailed,andhasnotbeenrestartedyetduetotheirbeinginsufficientcapacityavailable.Further,StorageDRSisnotallowedtoStoragevMotionavirtualmachinethatisownedbyamasterotherthantheonevCenterServeristalkingto.Thisisbecauseinsuchasituation,HAwouldnotbeabletoreprotectthevirtualmachineuntilthemastertowhichvCenterServeristalkingisabletolockthedatastoreagain.
StoragevMotionandHAIfavirtualmachineneedstoberestartedbyHAandthevirtualmachineisintheprocessofbeingStoragevMotionedandthevirtualmachinefails,therestartprocessisnotstarteduntilvCenterinformsthemasterthattheStoragevMotiontaskhascompletedorhasbeenrolledback.Ifthesourcehostfails,however,virtualmachinewillrestartthevirtualmachineaspartofthenormalworkflow.DuringaStoragevMotion,theHAagentonthehostonwhichtheStoragevMotionwasinitiatedmasksthefailurestateofthevirtualmachine.If,forwhateverreason,vCenterisunavailable,themaskingwilltimeoutafter15minutestoensurethatthevirtualmachinewillberestarted.
AlsonotethatwhenaStoragevMotioncompletes,vCenterwillreportthevirtualmachineasunprotecteduntilthemasterreportsitprotectedagainunderthenewpath.
HAandDRSHAintegratesonmultiplelevelswithDRS.ItisahugeimprovementanditissomethingthatwewantedtostressasithaschangedboththebehaviorandthereliabilityofHA.
HAandResourceFragmentation
vSphere6.xHADeepdive
113vSphereHAand...
Whenafailoverisinitiated,HAwillfirstcheckwhetherthereareresourcesavailableonthedestinationhostsforthefailover.If,forinstance,aparticularvirtualmachinehasaverylargereservationandtheAdmissionControlPolicyisbasedonapercentage,forexample,itcouldhappenthatresourcesarefragmentedacrossmultiplehosts.(Formoredetailsonthisscenario,seeChapter7.)HAwillaskDRStodefragmenttheresourcestoaccommodateforthisvirtualmachine’sresourcerequirements.AlthoughHAwillrequestadefragmentationofresources,aguaranteecannotbegiven.Assuch,evenwiththisadditionalintegration,youshouldstillbecautiouswhenitcomestoresourcefragmentation.
FlattenedShares
WhenshareshavebeensetcustomonavirtualmachineanissuecanarisewhenthatVMneedstoberestarted.WhenHAfailsoveravirtualmachine,itwillpower-onthevirtualmachineintheRootResourcePool.However,thevirtualmachine’sshareswerethoseconfiguredbyauserforit,andnotscaledforitbeingparentedundertheRootResourcePool.Thiscouldcausethevirtualmachinetoreceiveeithertoomanyortoofewresourcesrelativetoitsentitlement.
Ascenariowhereandwhenthiscanoccurwouldbethefollowing:
VM1hasa1000sharesandResourcePoolAhas2000shares.HoweverResourcePoolAhas2virtualmachinesandbothvirtualmachineswillhave50%ofthose“2000”shares.Thefollowingdiagramdepictsthisscenario:
vSphere6.xHADeepdive
114vSphereHAand...
Figure55-Flattensharesstartingpoint
Whenthehostfails,bothVM2andVM3willenduponthesamelevelasVM1,theRootResourcePool.However,asacustomsharesvalueof10,000wasspecifiedonbothVM2andVM3,theywillcompletelyblowawayVM1intimesofcontention.Thisisdepictedinthefollowingdiagram:
vSphere6.xHADeepdive
115vSphereHAand...
Figure56-Flattenshareshostfailure
ThissituationwouldpersistuntilthenextinvocationofDRSwouldre-parentthevirtualmachinesVM2andVM3totheiroriginalResourcePool.ToaddressthisissueHAcalculatesaflattenedsharevaluebeforethevirtualmachine’sisfailed-over.ThisflatteningprocessensuresthatthevirtualmachinewillgettheresourcesitwouldhavereceivedifithadfailedovertothecorrectResourcePool.Thisscenarioisdepictedinthefollowingdiagram.NotethatbothVM2andVM3areplacedundertheRootResourcePoolwithasharesvalueof1000.
vSphere6.xHADeepdive
116vSphereHAand...
Figure57-FlattensharesafterhostfailurebeforeDRSinvocation
Ofcourse,whenDRSisinvoked,bothVM2andVM3willbere-parentedunderResourcePool1andwillagainreceivethenumberofsharestheyhadbeenoriginallyassigned.
DPMandHA
IfDPMisenabledandresourcesarescarceduringanHAfailover,HAwilluseDRStotrytoadjustthecluster(forexample,bybringinghostsoutofstandbymodeormigratingvirtualmachinestodefragmenttheclusterresources)sothatHAcanperformthefailovers.
IfHAstrictAdmissionControlisenabled(default),DPMwillmaintainthenecessarylevelofpowered-oncapacitytomeettheconfiguredHAfailovercapacity.HAplacesaconstrainttopreventDPMfrompoweringdowntoomanyESXihostsifitwouldviolatetheAdmissionControlPolicy.
WhenHAadmissioncontrolisdisabled,HAwillpreventDPMfrompoweringoffallbutonehostinthecluster.Aminimumoftwohostsarekeptupregardlessoftheresourceconsumption.Thereasonthisbehaviorhaschangedisthatitisimpossibletorestartvirtualmachineswhentheonlyhostleftintheclusterhasjustfailed.
Inafailurescenario,ifHAcannotrestartsomevirtualmachines,itasksDRS/DPMtotrytodefragmentresourcesorbringhostsoutofstandbytoallowHAanotheropportunitytorestartthevirtualmachines.AnotherchangeisthatDRS/DPMwillpower-onorkeepon
vSphere6.xHADeepdive
117vSphereHAand...
hostsneededtoaddressclusterconstraints,evenifthosehostarelightlyutilized.Onceagain,inorderforthistobesuccessfulDRSwillneedtobeenabledandconfiguredtofullyautomated.WhennotconfiguredtofullyautomateduseractionisrequiredtoexecuteDRSrecommendationsandallowtherestartofvirtualmachinestooccur.
vSphere6.xHADeepdive
118vSphereHAand...
UseCase:StretchedClusterInthispartwewillbediscussingaspecificinfrastructurearchitectureandhowHA,DRSandStorageDRScanbeleveragedandshouldbedeployedtoincreaseavailability.Beitavailabilityofyourworkloadortheresourcesprovidedtoyourworkload,wewillguideyouthroughsomeofthedesignconsiderationsanddecisionpointsalongtheway.Ofcourse,afullunderstandingofyourenvironmentwillberequiredinordertomakeappropriatedecisionsregardingspecificimplementationdetails.Nevertheless,wehopethatthissectionwillprovideaproperunderstandingofhowcertainfeaturesplaytogetherandhowthesecanbeusedtomeettherequirementsofyourenvironmentandbuildthedesiredarchitecture.
ScenarioThescenariowehavechosenisastretchedclusteralsoreferredtoasaVMwarevSphereMetroStorageClustersolution.Wehavechosenthisspecificscenarioasitallowsustoexplainamultitudeofdesignandarchitecturalconsiderations.Althoughthisscenariohasbeentestedandvalidatedinourlab,everyenvironmentisuniqueandourrecommendationsarebasedonourexperienceandyourmileagemayvary.
AVMwarevSphereMetroStorageCluster(vMSC)configurationisaVMwarevSpherecertifiedsolutionthatcombinessynchronousreplicationwithstoragearraybasedclustering.Thesesolutionsaretypicallydeployedinenvironmentswherethedistancebetweendatacentersislimited,oftenmetropolitanorcampusenvironments.
Theprimarybenefitofastretchedclustermodelistoenablefullyactiveandworkload-balanceddatacenterstobeusedtotheirfullpotential.ManycustomersfindthisarchitectureattractiveduetothecapabilityofmigratingvirtualmachineswithvMotionandStoragevMotionbetweensites.Thisenableson-demandandnon-intrusivecross-sitemobilityofworkloads.Thecapabilityofastretchedclustertoprovidethisactivebalancingofresourcesshouldalwaysbetheprimarydesignandimplementationgoal.
Stretchedclustersolutionsofferthebenefitof:
WorkloadmobilityCross-siteautomatedloadbalancingEnhanceddowntimeavoidanceDisasteravoidance
Technicalrequirementsandconstraints
vSphere6.xHADeepdive
119UseCase-StretchedClusters
DuetothetechnicalconstraintsofanonlinemigrationofVMs,thefollowingspecificrequirements,whicharelistedintheVMwareCompatibilityGuide,mustbemetpriortoconsiderationofastretchedclusterimplementation:
StorageconnectivityusingFibreChannel,iSCSI,NFS,andFCoEissupported.ThemaximumsupportednetworklatencybetweensitesfortheVMwareESXimanagementnetworksis10msround-triptime(RTT).vMotion,andStoragevMotion,supportsamaximumof150mslatencyasofvSphere6.0,butthisisnotintendedforstretchedclusteringusage.Themaximumsupportedlatencyforsynchronousstoragereplicationlinksis10msRTT.Refertodocumentationfromthestoragevendorbecausethemaximumtoleratedlatencyislowerinmostcases.ThemostcommonlysupportedmaximumRTTis5ms.TheESXivSpherevMotionnetworkhasaredundantnetworklinkminimumof250Mbps.
Thestoragerequirementsareslightlymorecomplex.AvSphereMetroStorageClusterrequireswhatisineffectasinglestoragesubsystemthatspansbothsites.Inthisdesign,agivendatastoremustbeaccessible—thatis,beabletobereadandbewrittento—simultaneouslyfrombothsites.Further,whenproblemsoccur,theESXihostsmustbeabletocontinuetoaccessdatastoresfromeitherarraytransparentlyandwithnoimpacttoongoingstorageoperations.
Thisprecludestraditionalsynchronousreplicationsolutionsbecausetheycreateaprimary–secondaryrelationshipbetweentheactive(primary)LUNwheredataisbeingaccessedandthesecondaryLUNthatisreceivingreplication.ToaccessthesecondaryLUN,replicationisstopped,orreversed,andtheLUNismadevisibletohosts.This“promoted”secondaryLUNhasacompletelydifferentLUNIDandisessentiallyanewlyavailablecopyofaformerprimaryLUN.Thistypeofsolutionworksfortraditionaldisasterrecovery–typeconfigurationsbecauseitisexpectedthatVMsmustbestarteduponthesecondarysite.ThevMSCconfigurationrequiressimultaneous,uninterruptedaccesstoenablelivemigrationofrunningVMsbetweensites.
ThestoragesubsystemforavMSCmustbeabletobereadfromandwritetobothlocationssimultaneously.Alldiskwritesarecommittedsynchronouslyatbothlocationstoensurethatdataisalwaysconsistentregardlessofthelocationfromwhichitisbeingread.Thisstoragearchitecturerequiressignificantbandwidthandverylowlatencybetweenthesitesinthecluster.Increaseddistancesorlatenciescausedelaysinwritingtodiskandadramaticdeclineinperformance.TheyalsoprecludesuccessfulvMotionmigrationbetweenclusternodesthatresideindifferentlocations.
UniformversusNon-Uniform
vSphere6.xHADeepdive
120UseCase-StretchedClusters
vMSCsolutionsareclassifiedintotwodistinctcategories.Thesecategoriesarebasedonafundamentaldifferenceinhowhostsaccessstorage.Itisimportanttounderstandthedifferenttypesofstretchedstoragesolutionsbecausethisinfluencesdesignconsiderations.ThefollowingtwomaincategoriesareasdescribedontheVMwareHardwareCompatibilityList:
Uniformhostaccessconfiguration–ESXihostsfrombothsitesareallconnectedtoastoragenodeinthestorageclusteracrossallsites.PathspresentedtoESXihostsarestretchedacrossadistance.Nonuniformhostaccessconfiguration–ESXihostsateachsiteareconnectedonlytostoragenode(s)atthesamesite.PathspresentedtoESXihostsfromstoragenodesarelimitedtothelocalsite.
Thefollowingin-depthdescriptionsofbothcategoriesclearlydefinethemfromarchitecturalandimplementationperspectives.
Withuniformhostaccessconfiguration,hostsindatacenterAanddatacenterBhaveaccesstothestoragesystemsinbothdatacenters.Ineffect,thestorageareanetworkisstretchedbetweenthesites,andallhostscanaccessallLUNs.NetAppMetroClustersoftwareisanexampleofuniformstorage.Inthisconfiguration,read/writeaccesstoaLUNtakesplaceononeofthetwoarrays,andasynchronousmirrorismaintainedinahidden,read-onlystateonthesecondarray.Forexample,ifaLUNcontainingadatastoreisread/writeonthearrayindatacenterA,allESXihostsaccessthatdatastoreviathearrayindatacenterA.ForESXihostsindatacenterA,thisislocalaccess.ESXihostsindatacenterBthatarerunningVMshostedonthisdatastoresendread/writetrafficacrossthenetworkbetweendatacenters.Incaseofanoutageoranoperator-controlledshiftofcontroloftheLUNtodatacenterB,allESXihostscontinuetodetecttheidenticalLUNbeingpresented,butitisnowbeingaccessedviathearrayindatacenterB.
TheidealsituationisoneinwhichVMsaccessadatastorethatiscontrolled(read/write)bythearrayinthesamedatacenter.Thisminimizestrafficbetweendatacenterstoavoidtheperformanceimpactofreads’traversingtheinterconnect.
Thenotionof“siteaffinity”foraVMisdictatedbytheread/writecopyofthedatastore.“Siteaffinity”isalsosometimesreferredtoas“sitebias”or“LUNlocality.”ThismeansthatwhenaVMhassiteaffinitywithdatacenterA,itsread/writecopyofthedatastoreislocatedindatacenterA.Thisisexplainedinmoredetailinthe“vSphereDRS”subsectionofthissection.
vSphere6.xHADeepdive
121UseCase-StretchedClusters
Figure58-UniformConfiguration
Withnonuniformhostaccessconfiguration,hostsindatacenterAhaveaccessonlytothearraywithinthelocaldatacenter;thearray,aswellasitspeerarrayintheoppositedatacenter,isresponsibleforprovidingaccesstodatastoresinonedatacentertoESXihostsintheoppositedatacenter.EMCVPLEXisanexampleofastoragesystemthatcanbedeployedasanonuniformstoragecluster,althoughitcanalsobeconfiguredinauniformmanner.VPLEXprovidestheconceptofa“virtualLUN,”whichenablesESXihostsineachdatacentertoreadandwritetothesamedatastoreorLUN.VPLEXtechnologymaintainsthecachestateoneacharraysoESXihostsineitherdatacenterdetecttheLUNaslocal.EMCcallsthissolution“writeanywhere.”EvenwhentwoVMsresideonthesamedatastorebutarelocatedindifferentdatacenters,theywritelocallywithoutanyperformanceimpactoneitherVM.AkeypointwiththisconfigurationisthateachLUNordatastorehas“siteaffinity,”alsosometimesreferredtoas“sitebias”or“LUNlocality.”Inotherwords,ifanythinghappenstothelinkbetweenthesites,thestoragesystemonthepreferredsiteforagivendatastorewillbetheonlyoneremainingwithread/writeaccesstoit.Thispreventsanydatacorruptionincaseofafailurescenario.
vSphere6.xHADeepdive
122UseCase-StretchedClusters
Figure59-NonuniformConfiguration
Ourexamplesuseuniformstoragebecausetheseconfigurationsarecurrentlythemostcommonlydeployed.Manyofthedesignconsiderations,however,alsoapplytononuniformconfigurations.Wepointoutexceptionswhenthisisnotthecase.
ScenarioArchitectureInthissectionwewilldescribethearchitecturedeployedforthisscenario.WewillalsodiscusssomeofthebasicconfigurationandbehaviorofthevariousvSpherefeatures.Foranin-depthexplanationofeachrespectivefeature,refertotheHAandtheDRSsectionofthisbook.WewillmakespecificrecommendationsbasedonVMwarebestpracticesandprovideoperationalguidancewhereapplicable.Inourfailurescenariositwillbeexplainedhowthesepracticespreventorlimitdowntime.
Infrastructure
vSphere6.xHADeepdive
123UseCase-StretchedClusters
ThedescribedinfrastructureconsistsofasinglevSphere6.0clusterwithfourESXi6.0hosts.ThesehostsaremanagedbyasinglevCenterServer6.0instance.ThefirstsiteiscalledFrimley;thesecondsiteiscalledBluefin.ThenetworkbetweenFrimleydatacenterandBluefindatacenterisastretchedlayer2network.Thereisaminimaldistancebetweenthesites,asistypicalincampusclusterscenarios.
EachsitehastwoESXihosts,andthevCenterServerinstanceisconfiguredwithvSphereDRSaffinitytothehostsinBluefindatacenter.Inastretchedclusterenvironment,onlyasinglevCenterServerinstanceisused.ThisisdifferentfromatraditionalVMwareSiteRecoveryManager™configurationinwhichadualvCenterServerconfigurationisrequired.TheconfigurationofVM-to-hostaffinityrulesisdiscussedinmoredetailinthe“vSphereDRS”subsectionofthisdocument.
EightLUNsaredepictedthediagrambelow.FouroftheseareaccessedthroughthevirtualIPaddressactiveontheiSCSIstoragesystemintheFrimleydatacenter;fourareaccessedthroughthevirtualIPaddressactiveontheiSCSIstoragesystemintheBluefindatacenter.
vSphere6.xHADeepdive
124UseCase-StretchedClusters
Figure60-TestEnvironment
Location Hosts Datastores LocalIsolationAddress
Bluefin 172.16.103.184 Bluefin01 172.16.103.10
172.16.103.185 Bluefin02 n/a
Bluefin03 n/a
Bluefin04 n/a
Frimley 172.16.103.182 Frimley01 172.16.103.11
172.16.103.183 Frimley02 n/a
Frimley03 n/a
Frimley04 n/a
vSphere6.xHADeepdive
125UseCase-StretchedClusters
ThevSphereclusterisconnectedtoastretchedstoragesysteminafabricconfigurationwithauniformdeviceaccessmodel.Thismeansthateveryhostintheclusterisconnectedtobothstorageheads.Eachoftheheadsisconnectedtotwoswitches,whichareconnectedtotwosimilarswitchesinthesecondarylocation.ForanygivenLUN,oneofthetwostorageheadspresentstheLUNasread/writeviaiSCSI.Theotherstorageheadmaintainsthereplicated,read-onlycopythatiseffectivelyhiddenfromtheESXihosts.
vSphereConfigurationOurfocusinthissectionisonvSphereHA,vSphereDRS,andvSphereStorageDRSinrelationtostretchedclusterenvironments.DesignandoperationalconsiderationsregardingvSpherearecommonlyoverlookedandunderestimated.Muchemphasishastraditionallybeenplacedonthestoragelayer,butlittleattentionhasbeenappliedtohowworkloadsareprovisionedandmanaged.
Oneofthekeydriversforusingastretchedclusterisworkloadbalanceanddisasteravoidance.Howdoweensurethatourenvironmentisproperlybalancedwithoutimpactingavailabilityorseverelyincreasingtheoperationalexpenditure?Howdowebuildtherequirementsintoourprovisioningprocessandvalidateperiodicallythatwestillmeetthem?Ignoringtherequirementsmakestheenvironmentconfusingtoadministrateandlesspredictableduringthevariousfailurescenariosforwhichitshouldbeofhelp.
EachofthesethreevSpherefeatureshasveryspecificconfigurationrequirementsandcanenhanceenvironmentresiliencyandworkloadavailability.Architecturalrecommendationsbasedonourfindingsduringthetestingofthevariousfailurescenariosaregiventhroughoutthissection.
vSphereHA
Theenvironmenthasfourhostsandauniformstretchedstoragesolution.Afullsitefailureisonescenariothatmustbetakenintoaccountinaresilientarchitecture.VMwarerecommendsenablingvSphereHAadmissioncontrol.Workloadavailabilityistheprimarydriverformoststretchedclusterenvironments,soprovidingsufficientcapacityforafullsitefailureisrecommended.Suchhostsareequallydividedacrossbothsites.ToensurethatallworkloadscanberestartedbyvSphereHAonjustonesite,configuringtheadmissioncontrolpolicyto50percentforbothmemoryandCPUisrecommended.
VMwarerecommendsusingapercentage-basedpolicybecauseitoffersthemostflexibilityandreducesoperationaloverhead.Evenwhennewhostsareintroducedtotheenvironment,thereisnoneedtochangethepercentageandnoriskofaskewedconsolidationratioduetopossibleuseofVM-levelreservations.
vSphere6.xHADeepdive
126UseCase-StretchedClusters
ThescreenshotbelowshowsavSphereHAclusterconfiguredwithadmissioncontrolenabledandwiththepercentage-basedpolicysetto50percent.
Figure61-vSphereHAConfiguration
vSphereHAusesheartbeatmechanismstovalidatethestateofahost.Therearetwosuchmechanisms:networkheartbeatinganddatastoreheartbeating.NetworkheartbeatingistheprimarymechanismforvSphereHAtovalidateavailabilityofthehosts.Datastore
vSphere6.xHADeepdive
127UseCase-StretchedClusters
heartbeatingisthesecondarymechanismusedbyvSphereHA;itdeterminestheexactstateofthehostafternetworkheartbeatinghasfailed.
Ifahostisnotreceivinganyheartbeats,itusesafail-safemechanismtodetectifitismerelyisolatedfromitsmasternodeorcompletelyisolatedfromthenetwork.Itdoesthisbypingingthedefaultgateway.Inadditiontothismechanism,oneormoreisolationaddressescanbespecifiedmanuallytoenhancereliabilityofisolationvalidation.VMwarerecommendsspecifyingaminimumoftwoadditionalisolationaddresses,witheachaddresssitelocal.
Inourscenario,oneoftheseaddressesphysicallyresidesintheFrimleydatacenter;theotherphysicallyresidesintheBluefindatacenter.ThisenablesvSphereHAvalidationforcompletenetworkisolation,evenincaseofaconnectionfailurebetweensites.Thenextscreenshotshowsanexampleofhowtoconfiguremultipleisolationaddresses.ThevSphereHAadvancedsettingusedisdas.isolationaddress.MoredetailsonhowtoconfigurethiscanbefoundinVMwareKnowledgeBasearticle1002117.
Theminimumnumberofheartbeatdatastoresistwoandthemaximumisfive.ForvSphereHAdatastoreheartbeatingtofunctioncorrectlyinanytypeoffailurescenario,VMwarerecommendsincreasingthenumberofheartbeatdatastoresfromtwotofourinastretchedclusterenvironment.Thisprovidesfullredundancyforbothdatacenterlocations.Definingfourspecificdatastoresaspreferredheartbeatdatastoresisalsorecommended,selectingtwofromonesiteandtwofromtheother.ThisenablesvSphereHAtoheartbeattoadatastoreeveninthecaseofaconnectionfailurebetweensites.Subsequently,itenablesvSphereHAtodeterminethestateofahostinanyscenario.
Addinganadvancedsettingcalleddas.heartbeatDsPerHostcanincreasethenumberofheartbeatdatastores.Thisisshowninthescreenshotbelow.
vSphere6.xHADeepdive
128UseCase-StretchedClusters
Figure62-vSphereHAAdvancedSettings
Todesignatespecificdatastoresasheartbeatdevices,VMwarerecommendsusingSelectanyoftheclusterdatastorestakingintoaccountmypreferences.ThisenablesvSphereHAtoselectanyotherdatastoreifthefourdesignateddatastoresthathavebeenmanuallyselectedbecomeunavailable.VMwarerecommendsselectingtwodatastoresineachlocationtoensurethatdatastoresareavailableateachsiteinthecaseofasitepartition.
Figure63-DatastoreHeartbeating
PermanentDeviceLossandAllPathsDownScenarios
vSphere6.xHADeepdive
129UseCase-StretchedClusters
AsofvSphere6.0,enhancementshavebeenintroducedtoenableanautomatedfailoverofVMsresidingonadatastorethathaseitheranallpathsdown(APD)orapermanentdeviceloss(PDL)condition.PDLisapplicableonlytoblockstoragedevices.
APDLcondition,asisdiscussedinoneofourfailurescenarios,isaconditionthatiscommunicatedbythearraycontrollertotheESXihostviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditioniscommunicatedbythearrayiswhenaLUNissetoffline.ThisconditionisusedinnonuniformmodelsduringafailurescenariotoensurethattheESXihosttakesappropriateactionwhenaccesstoaLUNisrevoked.Whenafullstoragefailureoccurs,itisimpossibletogeneratethePDLconditionbecausethereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstateisidentifiedbytheESXihostasanAPDcondition.AnotherexampleofanAPDconditioniswherethestoragenetworkhasfailedcompletely.Inthisscenario,theESXihostalsodoesnotdetectwhathashappenedwiththestorageanddeclaresanAPD.
ToenablevSphereHAtorespondtobothanAPDandaPDLcondition,vSphereHAmustbeconfiguredinaspecificway.VMwarerecommendsenablingVMComponentProtection(VMCP).Afterthecreationofthecluster,VMCPmustbeenabled,asisshownbelow.
Figure64-VMComponentProtection
Theconfigurationscreencanbefoundasfollows:
LogintoVMwarevSphereWebClient.ClickHostsandClusters.Clicktheclusterobject.ClicktheManagetab.ClickvSphereHAandthenEdit.SelectProtectagainstStorageConnectivityLoss.Selectindividualfunctionality,asdescribedinthefollowing,byopeningFailureconditionsandVMresponse.
TheconfigurationforPDLisbasic.IntheFailureconditionsandVMresponsesection,theresponsefollowingdetectionofaPDLconditioncanbeconfigured.VMwarerecommendssettingthistoPoweroffandrestartVMs.Whenthisconditionisdetected,aVMisrestartedinstantlyonahealthyhostwithinthevSphereHAcluster.
vSphere6.xHADeepdive
130UseCase-StretchedClusters
ForanAPDscenario,configurationmustoccurinthesamesection,asisshowninthrscreenshotbelow.BesidesdefiningtheresponsetoanAPDcondition,itisalsopossibletoalterthetimingandtoconfigurethebehaviorwhenthefailureisrestoredbeforetheAPDtimeouthaspassed.
Figure65-VMCPDetailedConfiguration
WhenanAPDconditionisdetected,atimerisstarted.After140seconds,theAPDconditionisofficiallydeclaredandthedeviceismarkedasAPDtimeout.When140secondshavepassed,vSphereHAstartscounting.ThedefaultvSphereHAtimeoutis3minutes.Whenthe3minuteshavepassed,vSphereHArestartstheimpactedVMs,butVMCPcanbeconfiguredtoresponddifferentlyifpreferred.VMwarerecommendsconfiguringittoPoweroffandrestartVMs(conservative).
ConservativereferstothelikelihoodthatvSphereHAwillbeabletorestartVMs.Whensettoconservative,vSphereHArestartsonlytheVMthatisimpactedbytheAPDifitdetectsthatahostintheclustercanaccessthedatastoreonwhichtheVMresides.Inthecaseofaggressive,vSphereHAattemptstorestarttheVMevenifitdoesn’tdetectthestateoftheotherhosts.ThiscanleadtoasituationinwhichaVMisnotrestartedbecausethereisnohostthathasaccesstothedatastoreonwhichtheVMislocated.
IftheAPDisliftedandaccesstothestorageisrestoredbeforethetimeouthaspassed,vSphereHAdoesnotunnecessarilyrestarttheVMunlessexplicitlyconfiguredtodoso.IfaresponseischosenevenwhentheenvironmenthasrecoveredfromtheAPDcondition,ResponseforAPDrecoveryafterAPDtimeoutcanbeconfiguredtoResetVMs.VMwarerecommendsleavingthissettingdisabled.
WiththereleaseofvSphere5.5,anadvancedsettingcalledDisk.AutoremoveOnPDLwasintroduced.Itisimplementedbydefault.ThisfunctionalityenablesvSpheretoremovedevicesthataremarkedasPDLandhelpspreventreaching,forexample,the256-devicelimitforanESXihost.However,ifthePDLscenarioissolvedandthedevicereturns,the
vSphere6.xHADeepdive
131UseCase-StretchedClusters
ESXihost’sstoragesystemmustberescannedbeforethisdeviceappears.VMwarerecommendsdisablingDisk.AutoremoveOnPDLinthehostadvancedsettingsbysettingitto0.
Figure66-Disk.AutoremoveOnPDL
vSphereDRS
vSphereDRSisusedinmanyenvironmentstodistributeloadwithinacluster.Itoffersmanyotherfeaturesthatcanbeveryhelpfulinstretchedclusterenvironments.VMwarerecommendsenablingvSphereDRStofacilitateloadbalancingacrosshostsinthecluster.ThevSphereDRSload-balancingcalculationisbasedonCPUandmemoryuse.Careshouldbetakenwithregardtobothstorageandnetworkingresourcesaswellastotrafficflow.Toavoidstorageandnetworktrafficoverheadinastretchedclusterenvironment,VMwarerecommendsimplementingvSphereDRSaffinityrulestoenablealogicalseparationofVMs.Thissubsequentlyhelpsimproveavailability.ForVMsthatareresponsibleforinfrastructureservices,suchasMicrosoftActiveDirectoryandDNS,itassistsbyensuringseparationoftheseservicesacrosssites.
vSphereDRSaffinityrulesalsohelppreventunnecessarydowntime,andstorageandnetworktrafficflowoverhead,byenforcingpreferredsiteaffinity.VMwarerecommendsaligningvSphereVM-to-hostaffinityruleswiththestorageconfiguration—thatis,settingVM-to-hostaffinityruleswithapreferencethataVMrunonahostatthesamesiteasthearray
vSphere6.xHADeepdive
132UseCase-StretchedClusters
thatisconfiguredastheprimaryread/writenodeforagivendatastore.Forexample,inourtestconfiguration,VMsstoredontheFrimley01datastorearesetwithVM-to-hostaffinitywithapreferenceforhostsintheFrimleydatacenter.Thisensuresthatinthecaseofanetworkconnectionfailurebetweensites,VMsdonotloseconnectionwiththestoragesystemthatisprimaryfortheirdatastore.VM-to-hostaffinityrulesaimtoensurethatVMsstaylocaltothestorageprimaryforthatdatastore.ThiscoincidentallyalsoresultsinallreadI/O’sstayinglocal.
NOTE:DifferentstoragevendorsusedifferentterminologytodescribetherelationshipofaLUNtoaparticulararrayorcontroller.Forthepurposesofthisdocument,weusethegenericterm“storagesiteaffinity,”whichreferstothepreferredlocationforaccesstoagivenLUN.
VMwarerecommendsimplementing“shouldrules”becausetheseareviolatedbyvSphereHAinthecaseofafullsitefailure.Availabilityofservicesshouldalwaysprevail.Inthecaseof“mustrules,”vSphereHAdoesnotviolatetheruleset,andthiscanpotentiallyleadtoserviceoutages.Inthescenariowhereafulldatacenterfails,“mustrules”donotallowvSphereHAtorestarttheVMs,becausetheydonothavetherequiredaffinitytostartonthehostsintheotherdatacenter.Thisnecessitatestherecommendationtoimplement“shouldrules.”vSphereDRScommunicatestheserulestovSphereHA,andthesearestoredina“compatibilitylist”governingallowedstart-up.Ifasinglehostfails,VM-to-host“shouldrules”areignoredbydefault.VMwarerecommendsconfiguringvSphereHArulesettingstorespectVM-to-hostaffinityruleswherepossible.Withafullsitefailure,vSphereHAcanrestarttheVMsonhoststhatviolatetherules.Availabilitytakespreferenceinthisscenario.
Figure67-HAAffinityRuleSettings
Undercertaincircumstances,suchasmassivehostsaturationcoupledwithaggressiverecommendationsettings,vSphereDRScanalsoviolate“shouldrules.”Althoughthisisveryrare,werecommendmonitoringforviolationoftheserulesbecauseaviolationmightimpactavailabilityandworkloadperformance.
VMwarerecommendsmanuallydefining“sites”bycreatingagroupofhoststhatbelongtoasiteandthenaddingVMstothesesitesbasedontheaffinityofthedatastoreonwhichtheyareprovisioned.Inourscenario,onlyalimitednumberofVMswereprovisioned.VMwarerecommendsautomatingtheprocessofdefiningsiteaffinitybyusingtoolssuchasVMwarevCenterOrchestrator™orVMwarevSpherePowerCLI™.Ifautomatingtheprocessisnotan
vSphere6.xHADeepdive
133UseCase-StretchedClusters
option,useofagenericnamingconventionisrecommendedtosimplifythecreationofthesegroups.VMwarerecommendsthatthesegroupsbevalidatedonaregularbasistoensurethatallVMsbelongtothegroupwiththecorrectsiteaffinity.
Thefollowingscreenshotsdepicttheconfigurationusedforourscenario.Inthefirstscreenshot,allVMsthatshouldremainlocaltotheBluefindatacenterareaddedtotheBluefinVMgroup.
Figure68-VMGroup
Next,aBluefinhostgroupiscreatedthatcontainsallhostsresidinginthislocation.
vSphere6.xHADeepdive
134UseCase-StretchedClusters
Figure69-HostGroup
Next,anewruleiscreatedthatisdefinedasa“shouldrunonrule.”ItlinksthehostgroupandtheVMgroupfortheBluefinlocation.
vSphere6.xHADeepdive
135UseCase-StretchedClusters
Figure70-RuleDefinition
Thisshouldbedoneforbothlocations,whichshouldresultintworules.
Figure71-VM/HostRules
vSphere6.xHADeepdive
136UseCase-StretchedClusters
CorrectingAffinityRuleViolation
vSphereDRSassignsahighprioritytocorrectingaffinityruleviolations.Duringinvocation,theprimarygoalofvSphereDRSistocorrectanyviolationsandgeneraterecommendationstomigrateVMstothehostslistedinthehostgroup.Thesemigrationshaveahigherprioritythanload-balancingmovesandarestartedbeforethem.
vSphereDRSisinvokedevery5minutesbydefault,butitisalsotriggerediftheclusterdetectschanges.Forinstance,whenahostreconnectstothecluster,vSphereDRSisinvokedandgeneratesrecommendationstocorrecttheviolation.OurtestinghasshownthatvSphereDRSgeneratesrecommendationstocorrectaffinityrulesviolationswithin30secondsafterahostreconnectstothecluster.vSphereDRSislimitedbytheoverallcapacityofthevSpherevMotionnetwork,soitmighttakemultipleinvocationsbeforeallaffinityruleviolationsarecorrected.
vSphereStorageDRS
vSphereStorageDRSenablesaggregationofdatastorestoasingleunitofconsumptionfromanadministrativeperspective,anditbalancesVMdiskswhendefinedthresholdsareexceeded.Itensuresthatsufficientdiskresourcesareavailabletoaworkload.VMwarerecommendsenablingvSphereStorageDRSwithI/OMetricdisabled.TheuseofI/OMetricorVMwarevSphereStorageI/OControlisnotsupportedinavMSCconfiguration,asisdescribedinVMwareKnowledgeBasearticle2042596.
vSphere6.xHADeepdive
137UseCase-StretchedClusters
Figure72-StorageDRSConfiguration
vSphereStorageDRSusesvSphereStoragevMotiontomigrateVMdisksbetweendatastoreswithinadatastorecluster.Becausetheunderlyingstretchedstoragesystemsusesynchronousreplication,amigrationorseriesofmigrationshaveanimpactonreplicationtrafficandmightcausetheVMstobecometemporarilyunavailableduetocontentionfornetworkresourcesduringthemovementofdisks.MigrationtorandomdatastorescanalsopotentiallyleadtoadditionalI/OlatencyinuniformhostaccessconfigurationsifVMsarenotmigratedalongwiththeirvirtualdisks.Forexample,ifaVMresidingonahostatsiteAhasitsdiskmigratedtoadatastoreatsiteB,itcontinuesoperatingbutwithpotentiallydegradedperformance.TheVM’sdiskreadsnowaresubjecttotheincreasedlatencyassociatedwithreadingfromthevirtualiSCSIIPatsiteB.Readsaresubjecttointersitelatencyratherthanbeingsatisfiedbyalocaltarget.
Tocontrolifandwhenmigrationsoccur,VMwarerecommendsconfiguringvSphereStorageDRSinmanualmode.Thisenableshumanvalidationperrecommendationaswellasrecommendationstobeappliedduringoff-peakhours,whilegainingtheoperationalbenefitandefficiencyoftheinitialplacementfunctionality.
VMwarerecommendscreatingdatastoreclustersbasedonthestorageconfigurationwithrespecttostoragesiteaffinity.DatastoreswithasiteaffinityforsiteAshouldnotbemixedindatastoreclusterswithdatastoreswithasiteaffinityforsiteB.ThisenablesoperationalconsistencyandeasesthecreationandongoingmanagementofvSphereDRSVM-to-hostaffinityrules.EnsurethatallvSphereDRSVM-to-hostaffinityrulesareupdatedaccordingly
vSphere6.xHADeepdive
138UseCase-StretchedClusters
whenVMsaremigratedviavSphereStoragevMotionbetweendatastoreclustersandwhencrossingdefinedstoragesiteaffinityboundaries.Tosimplifytheprovisioningprocess,VMwarerecommendsaligningnamingconventionsfordatastoreclustersandVM-to-hostaffinityrules.
Figure73-DatastoreClusters
Thenamingconventionusedinourtestinggivesbothdatastoresanddatastoreclustersasite-specificnametoprovideeaseofalignmentofvSphereDRShostaffinitywithVMdeploymentinthecorrelatesite.
FailureScenariosTherearemanyfailuresthatcanbeintroducedinclusteredsystems.Butinaproperlyarchitectedenvironment,vSphereHA,vSphereDRS,andthestoragesubsystemdonotdetectmanyofthese.Wedonotaddressthezero-impactfailures,suchasthefailureofasinglenetworkcable,becausetheyareexplainedindepthinthedocumentationprovidedbythestoragevendorofthevarioussolutions.Wediscussthefollowing“common”failurescenarios:
Single-hostfailureinFrimleydatacenterSingle-hostisolationinFrimleydatacenterStoragepartitionDatacenterpartitionDiskshelffailureinFrimleydatacenterFullstoragefailureinFrimleydatacenterFullcomputefailureinFrimleydatacenterFullcomputefailureinFrimleydatacenterandfullstoragefailureinBluefindatacenterLossofcompleteFrimleydatacenter
vSphere6.xHADeepdive
139UseCase-StretchedClusters
Wealsoexaminescenariosinwhichspecificsettingsareincorrectlyconfigured.ThesesettingsdeterminetheavailabilityandrecoverabilityofVMsinafailurescenario.Itisimportanttounderstandtheimpactofmisconfigurationssuchasthefollowing:
IncorrectlyconfiguredVM-to-hostaffinityrulesIncorrectlyconfiguredheartbeatdatastoresIncorrectlyconfiguredisolationaddressIncorrectlyconfiguredPDLhandlingvCenterServersplit-brainscenario
Single-HostFailureinFrimleyDataCenter
Inthisscenario,wedescribethecompletefailureofahostinFrimleydatacenter.Thisscenarioisdepictedbelow.
vSphere6.xHADeepdive
140UseCase-StretchedClusters
Figure74-Single-HostFailureScenario
Result:vSphereHAsuccessfullyrestartedallVMsinaccordancewithVM-to-hostaffinityrules.
Explanation:Ifahostfails,thecluster’svSphereHAmasternodedetectsthefailurebecauseitnolongerisreceivingnetworkheartbeatsfromthehost.Thenthemasterstartsmonitoringfordatastoreheartbeats.Becausethehosthasfailedcompletely,itcannotgeneratedatastoreheartbeats;thesetooaredetectedasmissingbythevSphereHAmasternode.Duringthistime,athirdavailabilitycheck—pingingthemanagementaddressesofthefailedhosts—isconducted.Ifallofthesechecksreturnasunsuccessful,themasterdeclaresthemissinghostasdeadandattemptstorestartalltheprotectedVMsthathadbeenrunningonthehostbeforethemasterlostcontactwiththehost.
vSphere6.xHADeepdive
141UseCase-StretchedClusters
ThevSphereVM-to-hostaffinityrulesdefinedonaclusterlevelare“shouldrules.”vSphereHAVM-to-hostaffinityrulesshouldberespectedsoallVMsarerestartedwithinthecorrectsite.
However,ifthehostelementsoftheVM-to-hostgrouparetemporarilywithoutresources,oriftheyareunavailableforrestartsforanyotherreason,vSphereHAcandisregardtherulesandrestarttheremainingVMsonanyoftheremaininghostsinthecluster,regardlessoflocationandrules.Ifthisoccurs,vSphereDRSattemptstocorrectanyviolatedaffinityrulesatthefirstinvocationandautomaticallymigratesVMsinaccordancewiththeiraffinityrulestobringVMplacementinalignment.VMwarerecommendsmanuallyinvokingvSphereDRSafterthecauseforthefailurehasbeenidentifiedandresolved.ThisensuresthatallVMsareplacedonhostsinthecorrectlocationtoavoidpossibleperformancedegradationduetomisplacement.
Single-HostIsolationinFrimleyDataCenter
Inthisscenario,wedescribetheresponsetoisolationofasinglehostinFrimleydatacenterfromtherestofthenetwork.
Figure75-Single-HostIsolationScenario
Result:VMsremainrunningbecauseisolationresponseisconfiguredtoleavepoweredon.
Explanation:Whenahostisisolated,thevSphereHAmasternodedetectstheisolationbecauseitnolongerisreceivingnetworkheartbeatsfromthehost.Thenthemasterstartsmonitoringfordatastoreheartbeats.Becausethehostisisolated,itgeneratesdatastore
vSphere6.xHADeepdive
142UseCase-StretchedClusters
heartbeatsforthesecondaryvSphereHAdetectionmechanism.DetectionofvalidhostheartbeatsenablesthevSphereHAmasternodetodeterminethatthehostisrunningbutisisolatedfromthenetwork.Dependingontheisolationresponseconfigured,theimpactedhostcanpowerofforshutdownVMsorcanleavethempoweredon.Theisolationresponseistriggered30secondsafterthehosthasdetectedthatitisisolated.
VMwarerecommendsaligningtheisolationresponsetobusinessrequirementsandphysicalconstraints.Fromabestpracticesperspective,leavepoweredonistherecommendedisolationresponsesettingforthemajorityofenvironments.Isolatedhostsarerareinaproperlyarchitectedenvironment,giventhebuilt-inredundancyofmostmoderndesigns.Inenvironmentsthatusenetwork-basedstorageprotocols,suchasiSCSIandNFS,andwherenetworksareconverged,therecommendedisolationresponseispoweroff.Intheseenvironments,itismorelikelythatanetworkoutagethatcausesahosttobecomeisolatedalsoaffectsthehost’sabilitytocommunicatetothedatastores.
Ifanisolationresponsedifferentfromtherecommendedleavepoweredonisselectedandapowerofforshutdownresponseistriggered,thevSphereHAmasterrestartsVMsontheremainingnodesinthecluster.ThevSphereVM-to-hostaffinityrulesdefinedonaclusterlevelare“shouldrules.”However,becausethevSphereHArulesettingsspecifythatthevSphereHAVM-to-hostaffinityrulesshouldberespected,allVMsarerestartedwithinthecorrectsiteunder“normal”circumstances.
StoragePartition
Inthisscenario,afailurehasoccurredonthestoragenetworkbetweendatacenters,asisdepictedbelow.
vSphere6.xHADeepdive
143UseCase-StretchedClusters
Figure76-StoragePartitionScenario
Result:VMsremainrunningwithnoimpact.
Explanation:StoragesiteaffinityisdefinedforeachLUN,andvSphereDRSrulesalignwiththisaffinity.Therefore,becausestorageremainsavailablewithinthesite,noVMisimpacted.
IfforanyreasontheaffinityruleforaVMhasbeenviolatedandtheVMisrunningonahostinFrimleydatacenterwhileitsdiskresidesonadatastorethathasaffinitywithBluefindatacenter,itcannotsuccessfullyissueI/Ofollowinganintersitestoragepartition.ThisisbecausethedatastoreisinanAPDcondition.Inthisscenario,theVMcanberestartedbecausevSphereHAisconfiguredtorespondtoAPDconditions.Theresponseoccursafterthe3-minutegraceperiodhaspassed.This3-minuteperiodstartsaftertheAPDtimeoutof140secondshaspassedandtheAPDconditionhasbeendeclared.
vSphere6.xHADeepdive
144UseCase-StretchedClusters
ToavoidunnecessarydowntimeinanAPDscenario,VMwarerecommendsmonitoringcomplianceofvSphereDRSrules.AlthoughvSphereDRSisinvokedevery5minutes,thisdoesnotguaranteeresolutionofallaffinityruleviolations.Therefore,topreventunnecessarydowntime,rigidmonitoringisrecommendedthatenablesquickidentificationofanomaliessuchasaVM’scompute’sresidinginonesitewhileitsstorageresidesintheothersite.
DataCenterPartition
Inthisscenario,theFrimleydatacenterisisolatedfromtheBluefindatacenter,asisdepictedbelow.
Figure77-DataCenterPartitionScenario
Result:VMsremainrunningwithnoimpact.
vSphere6.xHADeepdive
145UseCase-StretchedClusters
Explanation:Inthisscenario,thetwodatacentersarefullyisolatedfromeachother.Thisscenarioissimilartoboththestoragepartitionandthehostisolationscenario.VMsarenotimpactedbythisfailurebecausevSphereDRSruleswerecorrectlyimplementedandnoruleswereviolated.
vSphereHAfollowsthislogicalprocesstodeterminewhichVMsrequirerestartingduringaclusterpartition:
ThevSphereHAmasternoderunninginFrimleydatacenterdetectsthatallhostsinBluefindatacenterareunreachable.Itfirstdetectsthatnonetworkheartbeatsarebeingreceived.Itthendetermineswhetheranystorageheartbeatsarebeinggenerated.Thischeckdoesnotdetectstorageheartbeatsbecausethestorageconnectionbetweensitesalsohasfailed,andtheheartbeatdatastoresareupdatedonly“locally.”BecausetheVMswithaffinitytotheremaininghostsarestillrunning,noactionisneededforthem.Next,vSphereHAdetermineswhetherarestartcanbeattempted.However,theread/writeversionofthedatastoreslocatedinBluefindatacenterarenotaccessiblebythehostsinFrimleydatacenter.Therefore,noattemptismadetostartthemissingVMs.
Similarly,theESXihostsinBluefindatacenterdetectthatthereisnomasteravailable,andtheyinitiateamasterelectionprocess.Afterthemasterhasbeenelected,ittriestodeterminewhichVMshadbeenrunningbeforethefailureanditattemptstorestartthem.BecauseallVMswithaffinitytoBluefindatacenterarestillrunningthere,thereisnoneedforarestart.OnlytheVMswithaffinitytoFrimleydatacenterareunavailable,andvSphereHAcannotrestartthembecausethedatastoresonwhichtheyarestoredhaveaffinitywithFrimleydatacenterandareunavailableinBluefindatacenter.
IfVM-to-hostaffinityruleshavebeenviolated—thatis,VMshavebeenrunningatalocationwheretheirstorageisnotdefinedasread/writebydefault—thebehaviorchanges.Thefollowingsequencedescribeswhatwouldhappeninthatcase:
1. TheVMwithaffinitytoFrimleydatacenterbutresidinginBluefindatacenterisunabletoreachitsdatastore.ThisresultsintheVM’sbeingunabletowritetoorreadfromdisk.
2. InFrimleydatacenter,thisVMisrestartedbyvSphereHAbecausethehostsinFrimleydatacenterdonotdetecttheinstance’srunninginBluefindatacenter.
3. BecausethedatastoreisavailableonlytoFrimleydatacenter,oneofthehostsinFrimleydatacenteracquiresalockontheVMDKandisabletopoweronthisVM.
4. ThiscanresultinascenarioinwhichthesameVMispoweredonandrunninginbothdatacenters.
vSphere6.xHADeepdive
146UseCase-StretchedClusters
Figure78-GhostVM
IftheAPDresponseisconfiguredtoPoweroffandrestartVMs(aggressive),asisrecommendedintheVMComponentProtectionsectionofthiswhitepaper,theVMispoweredoffaftertheAPDtimeoutandthegraceperiodhavepassed.ThisbehaviorisnewinvSphere6.0.
IftheAPDresponseisnotcorrectlyconfigured,twoVMswillberunning,forthefollowingpossiblereasons:
ThenetworkheartbeatfromthehostthatisrunningthisVMismissingbecausethereisnoconnectiontothatsite.Thedatastoreheartbeatismissingbecausethereisnoconnectiontothatsite.ApingtothemanagementaddressofthehostthatisrunningtheVMfailsbecausethereisnoconnectiontothatsite.ThemasterlocatedinFrimleydatacenterdetectsthattheVMhadbeenpoweredonbeforethefailure.BecauseitisunabletocommunicatewiththeVM’shostinBluefindatacenterafterthefailure,itattemptstorestarttheVMbecauseitcannotdetecttheactualstate.
vSphere6.xHADeepdive
147UseCase-StretchedClusters
Iftheconnectionbetweensitesisrestored,aclassic“VMsplit-brainscenario”willexist.Forashortperiodoftime,twocopiesoftheVMwillbeactiveonthenetwork,withbothhavingthesameMACaddress.Onlyonecopy,however,willhaveaccesstotheVMfiles,andvSphereHAwilldetectthis.Assoonasthisisdetected,allprocessesbelongingtotheVMcopythathasnoaccesstotheVMfileswillbekilled,asisdepictedbelow.
Figure79-TasksandEvents
Inthisexample,thedowntimeequatestoaVM’shavingtoberestarted.Propermaintenanceofsiteaffinitycanpreventthis.Toavoidunnecessarydowntime,VMwarerecommendsclosemonitoringtoensurethatvSphereDRSrulesalignwithdatastoresiteaffinity.
DiskShelfFailureinFrimleyDataCenter
Inthisscenario,oneofthediskshelvesinFrimleydatacenterhasfailed.BothFrimley01andFrimley02onstorageAareimpacted.
vSphere6.xHADeepdive
148UseCase-StretchedClusters
Figure80-DiskShelfFailureScenario
Result:VMsremainrunningwithnoimpact.
Explanation:Inthisscenario,onlyadiskshelfinFrimleydatacenterhasfailed.ThestorageprocessorhasdetectedthefailureandhasinstantlyswitchedfromtheprimarydiskshelfinFrimleydatacentertothemirrorcopyinBluefindatacenter.ThereisnonoticeableimpacttoanyoftheVMsexceptforatypicalshortspikeinI/Oresponsetime.Thestoragesolutionfullydetectsandhandlesthisscenario.ThereisnoneedforarescanofthedatastoresortheHBAsbecausetheswitchoverisseamlessandtheLUNsareidenticalfromtheESXiperspective.
FullStorageFailureinFrimleyDataCenter
vSphere6.xHADeepdive
149UseCase-StretchedClusters
Inthisscenario,afullstoragesystemfailurehasoccurredinFrimleydatacenter.
Figure81-FullStorageFailureScenario
Result:VMsremainrunningwithnoimpact.
Explanation:WhenthefullstoragesystemfailsinFrimleydatacenter,atakeovercommandmustbeinitiatedmanually.Asdescribedpreviously,weusedaNetAppMetroClusterconfigurationtodescribethisbehavior.ThistakeovercommandisparticulartoNetAppenvironments;dependingontheimplementedstoragesystem,therequiredprocedurecandiffer.Afterthecommandhasbeeninitiated,themirrored,read-onlycopyofeachofthefaileddatastoresissettoread/writeandisinstantlyaccessible.Wehavedescribedthisprocessonanextremelyhighlevel.Formoredetails,refertothestoragevendor’sdocumentation.
vSphere6.xHADeepdive
150UseCase-StretchedClusters
FromtheVMperspective,thisfailoverisseamless:Thestoragecontrollershandlethis,andnoactionisrequiredfromeitherthevSphereorstorageadministrator.AllI/OnowpassesacrosstheintrasiteconnectiontotheotherdatacenterbecauseVMsremainrunninginFrimleydatacenterwhiletheirdatastoresareaccessibleonlyinBluefindatacenter.
vSphereHAdoesnotdetectthistypeoffailure.Althoughthedatastoreheartbeatmightbelostbriefly,vSphereHAdoesnottakeactionbecausethevSphereHAmasteragentchecksforthedatastoreheartbeatonlywhenthenetworkheartbeatisnotreceivedfor3seconds.Becausethenetworkheartbeatremainsavailablethroughoutthestoragefailure,vSphereHAisnotrequiredtoinitiateanyrestarts.
PermanentDeviceLoss
Inthescenarioshownthediagrambelow,apermanentdeviceloss(PDL)conditionoccursbecausedatastoreFrimley01hasbeentakenofflineforESXi-01andESXi-02.PDLscenariosareuncommoninuniformconfigurationsandaremorelikelytooccurinanonuniformvMSCconfiguration.However,aPDLscenariocan,forinstance,occurwhentheconfigurationofastoragegroupchangesasinthecaseofthisdescribedscenario.
vSphere6.xHADeepdive
151UseCase-StretchedClusters
Figure82-PermanentDeviceLoss
Result:VMsarerestartedbyvSphereHAonESXi-03andESXi-04.
Explanation:WhenthePDLconditionoccurs,VMsrunningondatastoreFrimley01onhostsESXi-01andESXi-02arekilledinstantly.TheythenarerestartedbyvSphereHAonhostswithintheclusterthathaveaccesstothedatastore,ESXi-03andESXi-04inthisscenario.ThePDLandkillingoftheVMworldgroupcanbewitnessedbyfollowingtheentriesinthevmkernel.logfilelocatedin/var/log/ontheESXihosts.Thefollowingisanouttakeofthevmkernel.logfilewhereaPDLisrecognizedandappropriateactionistaken.
2012-03-14T13:39:25.085Zcpu7:4499)WARNING:VSCSI:4055:handle8198(vscsi4:0):openedby
wid4499(vmm0:fri-iscsi-02)hasPermanentDeviceLoss.Killingworldgroupleader4491
vSphere6.xHADeepdive
152UseCase-StretchedClusters
VMwarerecommendsconfiguringResponseforDatastorewithPermanentDeviceLoss(PDL)toPoweroffandrestartVMs.ThissettingensuresthatappropriateactionistakenwhenaPDLconditionexists.Thecorrectconfigurationisshownbelow.
Figure83-APD/PDLConfiguration
FullComputeFailureinFrimleyDataCenter
Inthisscenario,afullcomputefailurehasoccurredinFrimleydatacenter.
Figure84-FullComputeFailureScenario
Result:AllVMsaresuccessfullyrestartedinBluefindatacenter.
vSphere6.xHADeepdive
153UseCase-StretchedClusters
Explanation:ThevSphereHAmasterwaslocatedinFrimleydatacenteratthetimeofthefullcomputefailureatthatlocation.AfterthehostsinBluefindatacenterdetectedthatnonetworkheartbeatshadbeenreceived,anelectionprocesswasstarted.Withinapproximately20seconds,anewvSphereHAmasterwaselectedfromtheremaininghosts.ThenthenewmasterdeterminedwhichhostshadfailedandwhichVMshadbeenimpactedbythisfailure.BecauseallhostsattheothersitehadfailedandallVMsresidingonthemhadbeenimpacted,vSphereHAinitiatedtherestartofalloftheseVMs.vSphereHAcaninitiate32concurrentrestartsonasinglehost,providingalowrestartlatencyformostenvironments.Theonlysequencingofstartordercomesfromthebroadhigh,medium,andlowcategoriesforvSphereHA.Thispolicymustbesetonaper-VMbasis.Thesepoliciesweredeterminedtohavebeenadheredto;high-priorityVMsstartedfirst,followedbymedium-priorityandlow-priorityVMs.
Aspartofthetest,thehostsattheFrimleydatacenterwereagainpoweredon.AssoonasvSphereDRSdetectedthatthesehostswereavailable,avSphereDRSrunwasinvoked.BecausetheinitialvSphereDRSruncorrectsonlythevSphereDRSaffinityruleviolations,resourceimbalancewasnotcorrectuntilthenextfullinvocationofvSphereDRS.vSphereDRSisinvokedbydefaultevery5minutesorwhenVMsarepoweredofforonthroughtheuseofthevCenterWebClient.
LossofFrimleyDataCenter
Inthisscenario,afullfailureofFrimleydatacenterissimulated.
vSphere6.xHADeepdive
154UseCase-StretchedClusters
Figure85-FullDataCenterFailureScenario
Result:AllVMsweresuccessfullyrestartedinBluefindatacenter.
Explanation:Inthisscenario,thehostsinBluefindatacenterlostcontactwiththevSphereHAmasterandelectedanewvSphereHAmaster.Becausethestoragesystemhadfailed,atakeovercommandhadtobeinitiatedonthesurvivingsite,againduetotheNetApp-specificprocess.Afterthetakeovercommandhadbeeninitiated,thenewvSphereHAmasteraccessedtheper-datastorefilesthatvSphereHAusestorecordthesetofprotectedVMs.ThevSphereHAmasterthenattemptedtorestarttheVMsthatwerenotrunningonthesurvivinghostsinBluefindatacenter.Inourscenario,allVMswererestartedwithin2minutesafterfailureandwerefullyaccessibleandfunctionalagain.
NOTE:Bydefault,vSphereHAstopsattemptingtostartaVMafter30minutes.Ifthestorageteamdoesnotissueatakeovercommandwithinthattimeframe,thevSphereadministratormustmanuallystartupVMsafterthestoragebecomesavailable.
vSphere6.xHADeepdive
155UseCase-StretchedClusters
StretchedClusterusingVSANThisquestionkeepsoncomingupoverandoveragainlately,StretchedClusterusingVirtualSAN,canIdoit?WhenVirtualSANwasfirstreleasedtheanswertothisquestionwasaclearno,VirtualSANdidnotallowa"traditional"stretcheddeploymentusing2"data"sitesandathird"witness"site.AregularVirtualSANclusterstretchedacross3siteswithincampusdistancehoweverwaspossible.WithVirtualSAN6.1howeverintroducedthe"traditional"stretchedclusterdeploymentsupport.
Figure86-StretchedVirtualSANConfiguration
EverythinglearnedinthispublicationalsoappliestoastretchedVirtualSANcluster,withthatmeaningallHAandDRSbestpractices.ThereareacoupleofdifferencesthoughatthetimeofwritingbetweenavSphereMetroStorageClusterandaVSANStretchedClusterand
vSphere6.xHADeepdive
156UseCase-StretchedClusters
inthissectionwewillcalloutthesedifference.PleasenotethatthereisanextensiveVirtualSANStretchedClusteringGuideavailablewrittenbyCormacHoganandthereisafullVirtualSANbookavailablewrittenbyCormacHogananmyself(DuncanEpping).IfyouwanttoknowmoredetailsaboutVirtualSANwewouldliketorefertothesetwopublications.
Firstthingthatneedstobelookedatisthenetwork.FromaVirtualSANperspectivethereareclearrequirements:
5msRTTlatencymaxbetweendatasites200msRTTlatencymaxbetweendataandwitnesssiteBothL3andL2aresupportedbetweenthedatasites
10Gbpsbandwidthisrecommended,dependentonthenumberofVMsthiscouldbelowerorhigher,moreguidancewillbeprovidedsoonaroundthis!Multicastrequired,whichmeansthatifL3isused,someformofmulticastroutingisneeded.
L3isexpectedbetweendataandthewitnesssites100Mbpsbandwidthisrecommended,dependentonthenumberofVMsthiscouldbelowerorhigher,moreguidancewillbeprovidedsoonaroundthis!Nomulticastrequiredtothewitnesssite.
WhenitcomestoHAandDRStheconfigurationisprettystraightforward.Acoupleofthingswewanttopointoutastheyareconfigurationdetailswhichareeasytoforgetabout.Somearediscussedin-depthabove,somearesettingsyouactuallydonotusewithVSAN.Wewillpointthisoutinthelistbelow:
Makesuretospecifyadditionalisolationaddresses,oneineachsite(das.isolationAddress0–1).Disablethedefaultisolationaddressifitcan’tbeusedtovalidatethestateoftheenvironmentduringapartition(ifthegatewayisn’tavailableinbothsides).DisableDatastoreheartbeating,withouttraditionalexternalstoragethereisnoreasontohavethis.EnableHAAdmissionControlandmakesureitissetto50%forCPUandMemory.KeepVMslocalbycreating“VM/Host”shouldrules.
Thatcoversmostofit,summarizedrelativelybrieflycomparedtotheexcellentdocumentCormacdevelopedwithalldetailsyoucanwishfor.MakesuretoreadthatifyouwanttoknoweveryaspectandangleofastretchedVirtualSANclusterconfiguration.
vSphere6.xHADeepdive
157UseCase-StretchedClusters
AdvancedSettingsTherearevarioustypesofKBarticlesandthisKBarticleexplainsit,butletmesummarizeitandsimplifyitabittomakeiteasiertodigest.
Therearevarioussortsofadvancedsettings,butforHAthreeinparticular:
das.*–>Clusterleveladvancedsetting.fdm.*–>FDMhostleveladvancedsettingvpxd.*–>vCenterleveladvancedsetting.
Howdoyouconfigurethese?Configuringtheseistypicallystraightforward,andmostofyouhopefullyknowthisalready,ifnot,letusgooverthestepstohelpconfiguringyourenvironmentasdesired.
ClusterLevelIntheWebClient:
Click“HostsandClusters”clickyourclusterobjectclickthe“Manage”tabclick“Settings”and“vSphereHA”hitthe“Edit”button
FDMHostLevel
OpenupanSSHsessiontoyourhostandedit“/etc/opt/vmware/fdm/fdm.cfg”
vCenterLevelIntheWebClient:
Click“vCenter”click“vCenterServers”selecttheappropriatevCenterServerandclickthe“Manage”tabclick“Settings”and“AdvancedSettings”
Inthissectionwewillprimarilyfocusontheonesmostcommonlyused,afulldetailedlistcanbefoundinKB2033250.Pleasenotethateachbulletdetailstheversionwhichsupportsthisadvancedsetting.
das.maskCleanShutdownEnabled-5.0,5.1,5.5
vSphere6.xHADeepdive
158AdvancedSettings
WhetherthecleanshutdownflagwilldefaulttofalseforaninaccessibleandpoweredOffVM.EnablingthisoptionwilltriggerVMfailoveriftheVM'shomedatastoreisn'taccessiblewhenitdiesorisintentionallypoweredoff.
das.ignoreInsufficientHbDatastore-5.0,5.1,5.5,6.0Suppressthehostconfigissuethatthenumberofheartbeatdatastoresislessthandas.heartbeatDsPerHost.Defaultvalueis“false”.Canbeconfiguredas“true”or“false”.
das.heartbeatDsPerHost-5.0,5.1,5.5,6.0Thenumberofrequiredheartbeatdatastoresperhost.Thedefaultvalueis2;valueshouldbebetween2and5.
das.failuredetectiontime-4.1andpriorNumberofmilliseconds,timeouttime,forisolationresponseaction(withadefaultof15000milliseconds).Pre-vSphere4.0itwasageneralbestpracticetoincreasethevalueto60000whenanactive/standbyServiceConsolesetupwasused.Thisisnolongerneeded.ForahostwithtwoServiceConsolesorasecondaryisolationaddressafailuredetectiontimeof15000isrecommended.
das.isolationaddress[x]-5.0,5.1,5.5,6.0IPaddresstheESXhostsusestocheckonisolationwhennoheartbeatsarereceived,where[x]=0‐9.(seescreenshotbelowforanexample)VMwareHAwillusethedefaultgatewayasanisolationaddressandtheprovidedvalueasanadditionalcheckpoint.Irecommendtoaddanisolationaddresswhenasecondaryserviceconsoleisbeingusedforredundancypurposes.
das.usedefaultisolationaddress-5.0,5.1,5.5,6.0Valuecanbe“true”or“false”andneedstobesettofalseincasethedefaultgateway,whichisthedefaultisolationaddress,shouldnotorcannotbeusedforthispurpose.Inotherwords,ifthedefaultgatewayisanon-pingableaddress,setthe“das.isolationaddress0”toapingableaddressanddisabletheusageofthedefaultgatewaybysettingthisto“false”.
das.isolationShutdownTimeout-5.0,5.1,5.5,6.0TimeinsecondstowaitforaVMtobecomepoweredoffafterinitiatingaguestshutdown,beforeforcingapoweroff.
das.allowNetwork[x]-5.0,5.1,5.5EnablestheuseofportgroupnamestocontrolthenetworksusedforVMwareHA,where[x]=0–?.YoucansetthevaluetobeʺServiceConsole2ʺorʺManagementNetworkʺtouse(only)thenetworksassociatedwiththoseportgroupnamesinthenetworkingconfiguration.In5.5thisoptionisignoredwhenVSANisenabledbytheway!
das.bypassNetCompatCheck-4.1andpriorDisablethe“compatiblenetwork”checkforHAthatwasintroducedwithESX3.5Update2.DisablingthischeckwillenableHAtobeconfiguredinaclusterwhich
vSphere6.xHADeepdive
159AdvancedSettings
containshostsindifferentsubnets,so-calledincompatiblenetworks.Defaultvalueis“false”;settingitto“true”disablesthecheck.
das.ignoreRedundantNetWarning-5.0,5.1,5.5Removetheerroricon/messagefromyourvCenterwhenyoudon’thavearedundantServiceConsoleconnection.Defaultvalueis“false”,settingitto“true”willdisablethewarning.HAmustbereconfiguredaftersettingtheoption.
das.vmMemoryMinMB-5.0,5.1,5.5Theminimumdefaultslotsizeusedforcalculatingfailovercapacity.Highervalueswillreservemorespaceforfailovers.Donotconfusewith“das.slotMemInMB”.
das.slotMemInMB-5.0,5.1,5.5Setstheslotsizeformemorytothespecifiedvalue.Thisadvancedsettingcanbeusedwhenavirtualmachinewithalargememoryreservationskewstheslotsize,asthiswilltypicallyresultinanartificiallyconservativenumberofavailableslots.
das.vmCpuMinMHz-5.0,5.1,5.5Theminimumdefaultslotsizeusedforcalculatingfailovercapacity.Highervalueswillreservemorespaceforfailovers.Donotconfusewith“das.slotCpuInMHz”.
das.slotCpuInMHz-5.0,5.1,5.5SetstheslotsizeforCPUtothespecifiedvalue.ThisadvancedsettingcanbeusedwhenavirtualmachinewithalargeCPUreservationskewstheslotsize,asthiswilltypicallyresultinanartificiallyconservativenumberofavailableslots.
das.perHostConcurrentFailoversLimit-5.0,5.1,5.5Bydefault,HAwillissueupto32concurrentVMpower-onsperhost.Thissettingcontrolsthemaximumnumberofconcurrentrestartsonasinglehost.SettingalargervaluewillallowmoreVMstoberestartedconcurrentlybutwillalsoincreasetheaveragelatencytorecoverasitaddsmorestressonthehostsandstorage.
das.config.log.maxFileNum-5.0,5.1,5.5Desirednumberoflogrotations.
das.config.log.maxFileSize-5.0,5.1,5.5Maximumfilesizeinbytesofthelogfile.
das.config.log.directory-5.0,5.1,5.5Fulldirectorypathusedtostorelogfiles.
das.maxFtVmsPerHost-5.0,5.1,5.5ThemaximumnumberofprimaryandsecondaryFTvirtualmachinesthatcanbeplacedonasinglehost.Thedefaultvalueis4.
das.includeFTcomplianceChecks-5.0,5.1,5.5ControlswhethervSphereFaultTolerancecompliancechecksshouldberunaspartoftheclustercompliancechecks.SetthisoptiontofalsetoavoidclustercompliancefailureswhenFaultToleranceisnotbeingusedinacluster.
das.iostatsinterval(VMMonitoring)-5.0,5.1,5.5,6.0TheI/Ostatsintervaldeterminesifanydiskornetworkactivityhasoccurredforthe
vSphere6.xHADeepdive
160AdvancedSettings
virtualmachine.Thedefaultvalueis120seconds.das.config.fdm.deadIcmpPingInterval-5.0,5.1,5.5
Defaultvalueis10.ICPMpingsareusedtodeterminewhetheraslavehostisnetworkaccessiblewhentheFDMonthathostisnotconnectedtothemaster.Thisparametercontrolstheinterval(expressedinseconds)betweenpings.
das.config.fdm.icmpPingTimeout-5.0,5.1,5.5Defaultvalueis5.DefinesthetimetowaitinsecondsforanICMPpingreplybeforeassumingthehostbeingpingedisnotnetworkaccessible.
das.config.fdm.hostTimeout-5.0,5.1,5.5Defaultis10.ControlshowlongamasterFDMwaitsinsecondsforaslaveFDMtorespondtoaheartbeatbeforedeclaringtheslavehostnotconnectedandinitiatingtheworkflowtodeterminewhetherthehostisdead,isolated,orpartitioned.
das.config.fdm.stateLogInterval-5.0,5.1,5.5Defaultis600.Frequencyinsecondstologclusterstate.
das.config.fdm.ft.cleanupTimeout-5.0,5.1,5.5Defaultis900.WhenavSphereFaultToleranceVMispoweredonbyvCenterServer,vCenterServerinformstheHAmasteragentthatitisdoingso.ThisoptioncontrolshowmanysecondstheHAmasteragentwaitsforthepoweronofthesecondaryVMtosucceed.Ifthepowerontakeslongerthanthistime(mostlikelybecausevCenterServerhaslostcontactwiththehostorhasfailed),themasteragentwillattempttopoweronthesecondaryVM.
das.config.fdm.storageVmotionCleanupTimeout-5.0,5.1,5.Defaultis900.WhenaStoragevMotionisdoneinaHAenabledclusterusingpre5.0hostsandthehomedatastoreoftheVMisbeingmoved,HAmayinterpretthecompletionofthestoragevmotionasafailure,andmayattempttorestartthesourceVM.Toavoidthisissue,theHAmasteragentwaitsthespecifiednumberofsecondsforastoragevmotiontocomplete.Whenthestoragevmotioncompletesorthetimerexpires,themasterwillassesswhetherafailureoccurred.
das.config.fdm.policy.unknownStateMonitorPeriod-5.0,5.1,5.5,6.0DefinesthenumberofsecondstheHAmasteragentwaitsafteritdetectsthataVMhasfailedbeforeitattemptstorestarttheVM.
das.config.fdm.event.maxMasterEvents-5.0,5.1,5.5Defaultis1000.Definesthemaximumnumberofeventscachedbythemaster
das.config.fdm.event.maxSlaveEvents-5.0,5.1,5.5Defaultis600.Definesthemaximumnumberofeventscachedbyaslave.
Thatisalonglistofadvancedsettingsindeed,andhopefullynooneisplanningtotrythemalloutonasinglecluster,orevenonmultipleclusters.Avoidusingadvancedsettingsasmuchaspossibleasitdefinitelyleadstoincreasedcomplexity,andoftentomoredowntimeratherthanless.
vSphere6.xHADeepdive
161AdvancedSettings
SummarizingHopefullyIhavesucceededingivingyouabetterunderstandingoftheinternalworkingsofHA.IhopethatthispublicationhashandedyouthetoolsneededtoupdateyourvSpheredesignandultimatelytoincreasetheresiliencyandup-timeofyourenvironment.
Ihavetriedtosimplifysomeoftheconceptstomakeiteasiertounderstand,stillweacknowledgethatsomeconceptsaredifficulttograspandtheamountofarchitecturalchangesthatvSphere5andnewfunctionalitythatvSphere6havebroughtcanbeconfusingattimes.Ihopethoughthatafterreadingthiseveryoneisconfidentenoughtomaketherequiredorrecommendedchanges.
Ifthereareanyquestionspleasedonothesitatetoreachoutmeviatwitterormyblog,orleaveacommentontheonlineversionofthispublication.Iwilldomybesttoansweryourquestions.
vSphere6.xHADeepdive
162Summarizing
Changelog1.0.1-Minoredits1.0.2-StartwithVSANStretchedClusterinUsecasesection1.0.3-StartwithVVolsectioninVSANandVVolspecificssection1.0.4-UpdatetoVVolsectionandreplaceddiagram(figure15)
vSphere6.xHADeepdive
163Changelog